Commit Graph

  • e60d430cf5 [Fix] resolve conflicts of rebasing feat/speculative-decoding (#5557) Yuanheng Zhao 2024-04-07 14:53:30 +08:00
  • e1acb58423 [doc] Add inference/speculative-decoding README (#5552) Yuanheng Zhao 2024-04-03 18:06:23 +08:00
  • d85d91435a [Inference/SpecDec] Support GLIDE Drafter Model (#5455) Yuanheng Zhao 2024-04-01 21:54:24 +08:00
  • 912e24b2aa [SpecDec] Fix inputs for speculation and revise past KV trimming (#5449) Yuanheng Zhao 2024-03-12 17:57:01 +08:00
  • a37f82629d [Inference/SpecDec] Add Speculative Decoding Implementation (#5423) Yuanheng Zhao 2024-03-11 09:51:42 +08:00
  • 5a9b05f7b2 [Inference/SpecDec] Add Basic Drafter Model Container (#5405) Yuanheng Zhao 2024-02-28 13:48:17 +08:00
  • d63c469f45 [Infer] Revise and Adapt Triton Kernels for Spec-Dec (#5401) Yuanheng Zhao 2024-02-28 13:47:00 +08:00
  • d56c96334e Sync main to feature/colossal-infer Yuanheng Zhao 2024-04-09 10:09:34 +08:00
  • 7ca1d1c545 remove outdated triton test Yuanheng 2024-04-08 17:00:55 +08:00
  • d78817539e [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2024-04-08 08:41:07 +00:00
  • ce9401ad52 remove unused triton kernels Yuanheng 2024-04-08 16:25:12 +08:00
  • ed5ebd1735 [Fix] resolve conflicts of merging main Yuanheng 2024-04-08 16:21:47 +08:00
  • 641b1ee71a [devops] remove post commit ci (#5566) Hongxin Liu 2024-04-08 15:09:40 +08:00
  • 7ebdf48ac5 add cast and op_functor for cuda build-in types (#5546) 傅剑寒 2024-04-08 11:38:05 +08:00
  • 341263df48 [hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) digger yu 2024-04-07 19:04:58 +08:00
  • a799ca343b [fix] fix typo s/muiti-node /multi-node etc. (#5448) digger yu 2024-04-07 18:42:15 +08:00
  • 15055f9a36 [hotfix] quick fixes to make legacy tutorials runnable (#5559) Edenzzzz 2024-04-07 12:06:27 +08:00
  • 8e412a548e [shardformer] Sequence Parallelism Optimization (#5533) Zhongkai Zhao 2024-04-03 17:15:47 +08:00
  • 7e0ec5a85c fix incorrect sharding without zero (#5545) Edenzzzz 2024-04-02 20:11:18 +08:00
  • 4bb5d8923a [Fix/Inference] Remove unused and non-functional functions (#5543) Yuanheng Zhao 2024-04-02 14:16:59 +08:00
  • 61545fcfee feat: add sub_dp_size in plugin Wenhao Chen 2024-04-01 15:58:02 +08:00
  • 6ceaf4f1f8 tests: add sub_dp_group test Wenhao Chen 2024-04-01 14:51:36 +08:00
  • 9291f07964 feat: add sub_dp_group Wenhao Chen 2024-04-01 14:51:06 +08:00
  • 1aaa453706 perf: use async copy to accelerate memcpy Wenhao Chen 2024-03-28 15:02:32 +08:00
  • a53c8c1ade to: remove MoE temporarily Wenhao Chen 2024-03-28 13:36:09 +08:00
  • 93aaa21d4a feat: add DataPrefetcher Wenhao Chen 2024-03-27 18:32:23 +08:00
  • a1ab2d374e misc: add offload warning Wenhao Chen 2024-03-27 18:13:10 +08:00
  • a2878e39f4 [Inference] Add Reduce Utils (#5537) 傅剑寒 2024-04-01 15:34:25 +08:00
  • 04aca9e55b [Inference/Kernel]Add get_cos_and_sin Kernel (#5528) yuehuayingxueluo 2024-04-01 13:47:14 +08:00
  • e614aa34f3 [shardformer, pipeline] add gradient_checkpointing_ratio and heterogenous shard policy for llama (#5508) Wenhao Chen 2024-04-01 11:34:58 +08:00
  • df5e9c53cf [ColossalChat] Update RLHF V2 (#5286) YeAnbang 2024-03-29 14:12:29 +08:00
  • 36c4bb2893 [Fix] Grok-1 use tokenizer from the same pretrained path (#5532) Yuanheng Zhao 2024-03-28 16:30:04 +08:00
  • 934e31afb2 The writing style of tail processing and the logic related to macro definitions have been optimized. (#5519) yuehuayingxueluo 2024-03-28 10:42:51 +08:00
  • 00525f7772 [shardformer] fix pipeline forward error if custom layer distribution is used (#5189) Insu Jang 2024-03-27 01:57:00 -04:00
  • e6707a6e8d [format] applied code formatting on changed files in pull request 5510 (#5517) github-actions[bot] 2024-03-27 11:21:03 +08:00
  • 19e1a5cf16 [shardformer] update colo attention to support custom mask (#5510) Hongxin Liu 2024-03-27 11:19:32 +08:00
  • 9a3321e9f4 Merge pull request #5515 from Edenzzzz/fix_layout_convert Edenzzzz 2024-03-26 19:51:02 +08:00
  • 18edcd5368 Empty-Commit Edenzzzz 2024-03-26 19:50:41 +08:00
  • 61da3fbc52 fixed layout converter caching and updated tester Edenzzzz 2024-03-26 17:22:27 +08:00
  • e6496dd371 [Inference] Optimize request handler of llama (#5512) 傅剑寒 2024-03-26 16:37:14 +08:00
  • cbe34c557c Fix ColoTensorSpec for py11 (#5440) Rocky Duan 2024-03-26 15:56:49 +08:00
  • a7790a92e8 [devops] fix example test ci (#5504) Hongxin Liu 2024-03-26 15:09:05 +08:00
  • 131f32a076 [fix] fix grok-1 example typo (#5506) Yuanheng Zhao 2024-03-26 10:19:42 +08:00
  • 0688d92e2d [shardformer]Fix lm parallel. (#5480) flybird11111 2024-03-25 17:21:51 +08:00
  • 6251d68dc9 [fix] PR #5354 (#5501) Runyu Lu 2024-03-25 15:24:17 +08:00
  • 1d626233ce Merge pull request #5434 from LRY89757/colossal-infer-cuda-graph Runyu Lu 2024-03-25 14:55:59 +08:00
  • 68e9396bc0 [fix] merge conflicts Runyu Lu 2024-03-25 14:48:28 +08:00
  • 34e909256c [release] grok-1 inference benchmark (#5500) binmakeswell 2024-03-25 14:42:51 +08:00
  • 87079cffe8 [Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision Flag To Rotary Embedding (#5461) yuehuayingxueluo 2024-03-25 13:40:34 +08:00
  • bb0a668fee [hotfix] set return_outputs=False in examples and polish code (#5404) Wenhao Chen 2024-03-25 12:31:09 +08:00
  • ff4998c6f3 [fix] remove unused comment Runyu Lu 2024-03-25 12:00:57 +08:00
  • 9fe61b4475 [fix] Runyu Lu 2024-03-25 11:37:58 +08:00
  • 5fcd7795cd [example] update Grok-1 inference (#5495) Yuanheng Zhao 2024-03-24 20:24:11 +08:00
  • 6df844b8c4 [release] grok-1 314b inference (#5490) binmakeswell 2024-03-22 15:48:12 +08:00
  • 848a574c26 [example] add grok-1 inference (#5485) Hongxin Liu 2024-03-21 18:07:22 +08:00
  • 5b017d6324 [fix] Runyu Lu 2024-03-21 15:55:25 +08:00
  • 606603bb88 Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into colossal-infer-cuda-graph Runyu Lu 2024-03-21 14:25:22 +08:00
  • 4eafe0c814 [fix] unused option Runyu Lu 2024-03-21 11:28:42 +08:00
  • d158fc0e64 [doc] update open-sora demo (#5479) binmakeswell 2024-03-20 16:08:41 +08:00
  • 7ff42cc06d add vec_type_trait implementation (#5473) 傅剑寒 2024-03-19 18:36:40 +08:00
  • b96557b5e1 Merge pull request #5469 from Courtesy-Xs/add_vec_traits 傅剑寒 2024-03-19 13:53:26 +08:00
  • aabc9fb6aa [feat] add use_cuda_kernel option Runyu Lu 2024-03-19 13:24:25 +08:00
  • 48c4f29b27 refactor vector utils xs_courtesy 2024-03-19 11:32:01 +08:00
  • bd998ced03 [doc] release Open-Sora 1.0 with model weights (#5468) binmakeswell 2024-03-18 18:31:18 +08:00
  • 5e16bf7980 [shardformer] fix gathering output when using tensor parallelism (#5431) flybird11111 2024-03-18 15:55:11 +08:00
  • b6e9785885 Merge pull request #5457 from Courtesy-Xs/ly_add_implementation_for_launch_config 傅剑寒 2024-03-15 11:23:44 +08:00
  • 5724b9e31e add some comments xs_courtesy 2024-03-15 11:18:57 +08:00
  • 6e30248683 [fix] tmp for test Runyu Lu 2024-03-14 16:13:00 +08:00
  • 388e043930 add implementatino for GetGPULaunchConfig1D xs_courtesy 2024-03-14 11:13:40 +08:00
  • d02e257abd Merge branch 'feature/colossal-infer' into colossal-infer-cuda-graph Runyu Lu 2024-03-14 10:37:05 +08:00
  • ae24b4f025 diverse tests Runyu Lu 2024-03-14 10:35:08 +08:00
  • 1821a6dab0 [fix] pytest and fix dyn grid bug Runyu Lu 2024-03-13 17:28:32 +08:00
  • f366a5ea1f [Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA Kernel (#5418) yuehuayingxueluo 2024-03-13 17:20:03 +08:00
  • ed431de4e4 fix rmsnorm template function invocation problem(template function partial specialization is not allowed in Cpp) and luckily pass e2e precision test (#5454) Steve Luo 2024-03-13 16:00:55 +08:00
  • f2e8b9ef9f [devops] fix compatibility (#5444) Hongxin Liu 2024-03-13 15:24:13 +08:00
  • 6fd355a5a6 Merge pull request #5452 from Courtesy-Xs/fix_include_path 傅剑寒 2024-03-13 11:26:41 +08:00
  • c1c45e9d8e fix include path xs_courtesy 2024-03-13 11:21:06 +08:00
  • b699f54007 optimize rmsnorm: add vectorized elementwise op, feat loop unrolling (#5441) Steve Luo 2024-03-12 17:48:02 +08:00
  • 368a2aa543 Merge pull request #5445 from Courtesy-Xs/refactor_infer_compilation 傅剑寒 2024-03-12 14:14:37 +08:00
  • 385e85afd4 [hotfix] fix typo s/keywrods/keywords etc. (#5429) digger yu 2024-03-12 11:25:16 +08:00
  • 095c070a6e refactor code xs_courtesy 2024-03-11 17:06:57 +08:00
  • da885ed540 fix tensor data update for gemini loss caluculation (#5442) Camille Zhong 2024-03-11 13:49:58 +08:00
  • 21e1e3645c Merge pull request #5435 from Courtesy-Xs/add_gpu_launch_config 傅剑寒 2024-03-11 11:15:29 +08:00
  • 633e95b301 [doc] add doc Runyu Lu 2024-03-11 10:56:51 +08:00
  • 9dec66fad6 [fix] multi graphs capture error Runyu Lu 2024-03-11 10:51:16 +08:00
  • b2c0d9ff2b [fix] multi graphs capture error Runyu Lu 2024-03-11 10:49:31 +08:00
  • f7aecc0c6b feat rmsnorm cuda kernel and add unittest, benchmark script (#5417) Steve Luo 2024-03-08 16:21:12 +08:00
  • 5eb5ff1464 refactor code xs_courtesy 2024-03-08 15:41:14 +08:00
  • 01d289d8e5 Merge branch 'feature/colossal-infer' of https://github.com/hpcaitech/ColossalAI into add_gpu_launch_config xs_courtesy 2024-03-08 15:04:55 +08:00
  • a46598ac59 add reusable utils for cuda xs_courtesy 2024-03-08 14:53:29 +08:00
  • 2b28b54ac6 Merge pull request #5433 from Courtesy-Xs/add_silu_and_mul 傅剑寒 2024-03-08 14:44:37 +08:00
  • cefaeb5fdd [feat] cuda graph support and refactor non-functional api Runyu Lu 2024-03-08 14:19:35 +08:00
  • 8020f42630 [release] update version (#5411) v0.3.6 Hongxin Liu 2024-03-07 23:36:07 +08:00
  • 95c21498d4 add silu_and_mul for infer xs_courtesy 2024-03-07 16:57:49 +08:00
  • 743e7fad2f [colossal-llama2] add stream chat examlple for chat version model (#5428) Camille Zhong 2024-03-07 14:58:56 +08:00
  • 68f55a709c [hotfix] fix stable diffusion inference bug. (#5289) Youngon 2024-03-05 22:03:40 +08:00
  • c8003d463b [doc] Fix typo s/infered/inferred/ (#5288) hugo-syn 2024-03-05 15:02:08 +01:00
  • 5e1c93d732 [hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) digger yu 2024-03-05 21:52:30 +08:00
  • a7ae2b5b4c [eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) Dongruixuan Li 2024-03-05 08:48:55 -05:00
  • 049121d19d [hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) digger yu 2024-03-05 21:48:46 +08:00