Commit Graph

  • 2011b1356a [misc] Update PyTorch version in docs (#5724) binmakeswell 2024-05-16 13:54:32 +08:00
  • 5bedea6e10 [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2024-05-16 05:20:00 +00:00
  • 4148ceed9f [gemini] use compute_chunk to find next chunk hxwang 2024-05-16 13:17:26 +08:00
  • b2e9745888 [chore] sync hxwang 2024-05-16 04:45:06 +00:00
  • a8d459f99a 【Inference] Delete duplicated package (#5723) 傅剑寒 2024-05-16 10:49:03 +08:00
  • 6e38eafebe [gemini] prefetch chunks hxwang 2024-05-15 16:51:44 +08:00
  • f47f2fbb24 [Inference] Fix API server, test and example (#5712) Jianghai 2024-05-15 15:47:31 +08:00
  • 913c920ecc [Colossal-LLaMA] Fix sft issue for llama2 (#5719) Tong Li 2024-05-15 10:52:11 +08:00
  • 74c47921fa [Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) Runyu Lu 2024-05-14 20:17:43 +08:00
  • 5bbab1533a [ci] Fix example tests (#5714) Yuanheng Zhao 2024-05-14 16:08:51 +08:00
  • 121d7ad629 [Inference] Delete duplicated copy_vector (#5716) 傅剑寒 2024-05-14 14:35:33 +08:00
  • 43995ee436 [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) Edenzzzz 2024-05-14 13:52:45 +08:00
  • 7806842f2d add paged-attetionv2: support seq length split across thread block (#5707) Steve Luo 2024-05-14 12:46:54 +08:00
  • 18d67d0e8e [Feat]Inference RPC Server Support (#5705) Runyu Lu 2024-05-14 10:00:55 +08:00
  • 393c8f5b7f [hotfix] fix inference typo (#5438) hugo-syn 2024-05-13 15:06:44 +02:00
  • 785cd9a9c9 [misc] Update PyTorch version in docs (#5711) Edenzzzz 2024-05-13 12:02:52 +08:00
  • de4bf3dedf [Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708) yuehuayingxueluo 2024-05-11 15:13:25 +08:00
  • 50104ab340 [Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706) 傅剑寒 2024-05-10 18:39:54 +08:00
  • 537f6a3855 [Shardformer]fix the num_heads assert for llama model and qwen model (#5704) Wang Binluo 2024-05-10 15:33:39 +08:00
  • a3cc68ca93 [Shardformer] Support the Qwen2 model (#5699) Wang Binluo 2024-05-09 20:04:25 +08:00
  • bfad39357b [Inference/Feat] Add quant kvcache interface (#5700) 傅剑寒 2024-05-09 18:03:24 +08:00
  • 492520dbdb Merge pull request #5588 from hpcaitech/feat/online-serving Jianghai 2024-05-09 17:19:45 +08:00
  • 5d9a49483d [Inference] Add example test_ci script feat/online-serving CjhHa1 2024-05-09 05:44:05 +00:00
  • d4c5ef441e [gemini]remove registered gradients hooks (#5696) flybird11111 2024-05-09 10:29:49 +08:00
  • bc9063adf1 resolve rebase conflicts on Branch feat/online-serving CjhHa1 2024-05-08 10:36:42 +00:00
  • 61a1b2e798 [Inference] Fix bugs and docs for feat/online-server (#5598) Jianghai 2024-05-08 15:14:06 +08:00
  • 7bbb28e48b [Inference] resolve rebase conflicts CjhHa1 2024-04-11 10:12:31 +08:00
  • c064032865 [Online Server] Chat Api for streaming and not streaming response (#5470) Jianghai 2024-04-07 14:45:43 +08:00
  • de378cd2ab [Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432) Jianghai 2024-03-18 17:06:05 +08:00
  • 69cd7e069d [Inference] ADD async and sync Api server using FastAPI (#5396) Jianghai 2024-03-01 14:47:36 +08:00
  • d482922035 [Inference] Support the logic related to ignoring EOS token (#5693) yuehuayingxueluo 2024-05-08 19:59:10 +08:00
  • 9c2fe7935f [Inference]Adapt temperature processing logic (#5689) yuehuayingxueluo 2024-05-08 17:58:29 +08:00
  • 12e7c28d5e [hotfix] fix OpenMOE example import path (#5697) Yuanheng Zhao 2024-05-08 15:48:47 +08:00
  • 22297789ab Merge pull request #5684 from wangbluo/parallel_output Wang Binluo 2024-05-07 22:59:42 -05:00
  • 55cc7f3df7 [Fix] Fix Inference Example, Tests, and Requirements (#5688) Yuanheng Zhao 2024-05-08 11:30:15 +08:00
  • f9afe0addd [hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695) Yuanheng Zhao 2024-05-07 23:13:14 +08:00
  • 4e50cce26b fix the mistral model wangbluo 2024-05-07 09:17:56 +00:00
  • a8408b4d31 remove comment code wangbluo 2024-05-07 07:08:00 +00:00
  • ca56b93d83 [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2024-05-07 07:07:07 +00:00
  • 108ddfb795 add parallel_output for the opt model wangbluo 2024-05-03 08:58:00 +00:00
  • 88f057ce7c [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2024-05-07 07:03:46 +00:00
  • 58954b2986 [misc] Add an existing issue checkbox in bug report (#5691) Edenzzzz 2024-05-07 12:18:50 +08:00
  • 77ec773388 [zero]remove registered gradients hooks (#5687) flybird11111 2024-05-07 12:01:38 +08:00
  • c25f83c85f fix missing pad token (#5690) Edenzzzz 2024-05-06 18:17:26 +08:00
  • 1ace1065e6 [Inference/Feat] Add quant kvcache support for decode_kv_cache_memcpy (#5686) 傅剑寒 2024-05-06 15:35:13 +08:00
  • db7b3051f4 [Sync] Update from main to feature/colossal-infer (Merge pull request #5685) Yuanheng Zhao 2024-05-06 14:43:38 +08:00
  • 725fbd2ed0 [Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679) Steve Luo 2024-05-06 10:55:34 +08:00
  • 8754abae24 [Fix] Fix & Update Inference Tests (compatibility w/ main) Yuanheng Zhao 2024-05-05 16:28:56 +00:00
  • 56ed09aba5 [sync] resolve conflicts of merging main Yuanheng Zhao 2024-05-05 05:14:00 +00:00
  • 537a3cbc4d [kernel] Support New KCache Layout - Triton Kernel (#5677) Yuanheng Zhao 2024-05-03 17:20:45 +08:00
  • 2632916329 remove useless code wangbluo 2024-05-01 09:23:43 +00:00
  • 9df016fc45 [Inference] Fix quant bits order (#5681) 傅剑寒 2024-04-30 19:38:00 +08:00
  • f79963199c [inference]Add alibi to flash attn function (#5678) yuehuayingxueluo 2024-04-30 19:35:05 +08:00
  • ef8e4ffe31 [Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680) 傅剑寒 2024-04-30 18:33:53 +08:00
  • 9efc79ef24 add parallel output for mistral model wangbluo 2024-04-30 08:10:20 +00:00
  • 5cd75ce4c7 [Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663) Steve Luo 2024-04-30 15:52:23 +08:00
  • 5f00002e43 [Inference] Adapt Baichuan2-13B TP (#5659) yuehuayingxueluo 2024-04-30 15:47:07 +08:00
  • 808ee6e4ad [Inference/Feat] Feat quant kvcache step2 (#5674) 傅剑寒 2024-04-30 11:26:36 +08:00
  • d3f34ee8cc [Shardformer] add assert for num of attention heads divisible by tp_size (#5670) Wang Binluo 2024-04-29 05:47:47 -05:00
  • 6af6d6fc9f [shardformer] support bias_gelu_jit_fused for models (#5647) flybird11111 2024-04-29 15:33:51 +08:00
  • 7f8b16635b [misc] refactor launch API and tensor constructor (#5666) Hongxin Liu 2024-04-29 10:40:11 +08:00
  • 91fa553775 [Feature] qlora support (#5586) linsj20 2024-04-17 15:03:31 +08:00
  • 8954a0c2e2 [LowLevelZero] low level zero support lora (#5153) flybird11111 2023-12-21 17:01:01 +08:00
  • 14b0d4c7e5 [lora] add lora APIs for booster, support lora for TorchDDP (#4981) Baizhou Zhang 2023-10-31 15:19:37 +08:00
  • c1594e4bad [devops] fix release docker ci (#5665) Hongxin Liu 2024-04-27 19:11:57 +08:00
  • 4cfbf30a5e [release] update version (#5654) v0.3.7 Hongxin Liu 2024-04-27 18:59:47 +08:00
  • 68ec99e946 [hotfix] add soft link to support required files (#5661) Tong Li 2024-04-26 21:12:04 +08:00
  • 8ccb6714e7 [Inference/Feat] Add kvcache quantization support for FlashDecoding (#5656) 傅剑寒 2024-04-26 19:40:37 +08:00
  • 5be590b99e [kernel] Support new KCache Layout - Context Attention Triton Kernel (#5658) Yuanheng Zhao 2024-04-26 17:51:49 +08:00
  • b8a711aa2d [news] llama3 and open-sora v1.1 (#5655) binmakeswell 2024-04-26 15:36:37 +08:00
  • 2082852f3f [lazyinit] skip whisper test (#5653) Hongxin Liu 2024-04-26 14:03:12 +08:00
  • 8b7d535977 fix gptj (#5652) flybird11111 2024-04-26 11:52:27 +08:00
  • 3c91e3f176 [Inference]Adapt to baichuan2 13B (#5614) yuehuayingxueluo 2024-04-25 23:11:30 +08:00
  • f342a93871 [Fix] Remove obsolete files - inference (#5650) Yuanheng Zhao 2024-04-25 22:04:59 +08:00
  • 1b387ca9fe [shardformer] refactor pipeline grad ckpt config (#5646) Hongxin Liu 2024-04-25 15:19:30 +08:00
  • 7ef91606e1 [Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) Season 2024-04-25 14:45:52 +08:00
  • bbb2c21f16 [shardformer] fix chatglm implementation (#5644) Hongxin Liu 2024-04-25 14:41:17 +08:00
  • a8fd3b0342 [Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643) Steve Luo 2024-04-25 14:24:02 +08:00
  • 5d88ef1aaf [shardformer] remove useless code (#5645) flybird11111 2024-04-25 13:46:39 +08:00
  • 148506c828 [coloattention]modify coloattention (#5627) flybird11111 2024-04-25 10:47:14 +08:00
  • 7ee569b05f [hotfix] Fixed fused layernorm bug without apex (#5609) Edenzzzz 2024-04-24 23:04:06 +08:00
  • 0d0a582033 [shardformer] update transformers (#5583) Wang Binluo 2024-04-24 22:51:50 +08:00
  • 90cd5227a3 [Fix/Inference]Fix vllm benchmark (#5630) yuehuayingxueluo 2024-04-24 14:51:36 +08:00
  • 279300dc5f [Inference/Refactor] Refactor compilation mechanism and unified multi hw (#5613) 傅剑寒 2024-04-24 14:17:54 +08:00
  • 04863a9b14 [example] Update Llama Inference example (#5629) Yuanheng Zhao 2024-04-23 22:23:07 +08:00
  • f4c5aafe29 [example] llama3 (#5631) binmakeswell 2024-04-23 18:48:07 +08:00
  • fcf776ff1b [Feature] LoRA rebased to main branch (#5622) feature/lora linsj20 2024-04-23 17:57:44 +08:00
  • 4de4e31818 [exampe] update llama example (#5626) Hongxin Liu 2024-04-23 14:12:20 +08:00
  • 862fbaaa62 [Feature] Support LLaMA-3 CPT and ST (#5619) Tong Li 2024-04-23 13:54:05 +08:00
  • 12f10d5b0b [Fix/Inference]Fix CUDA Rotary Rmbedding GQA (#5623) yuehuayingxueluo 2024-04-23 13:44:49 +08:00
  • 5d4c1fe8f5 [Fix/Inference] Fix GQA Triton and Support Llama3 (#5624) Yuanheng Zhao 2024-04-23 13:09:55 +08:00
  • e094933da1 [shardformer] fix pipeline grad ckpt (#5620) Hongxin Liu 2024-04-22 11:25:39 +08:00
  • ccf72797e3 feat baichuan2 rmsnorm whose hidden size equals to 5120 (#5611) Steve Luo 2024-04-19 15:34:53 +08:00
  • d83c633ca6 [hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) Edenzzzz 2024-04-18 18:15:50 +08:00
  • e37ee2fb65 [Feat]Tensor Model Parallel Support For Inference (#5563) Runyu Lu 2024-04-18 16:56:46 +08:00
  • be396ad6cc [Inference/Kernel] Add Paged Decoding kernel, sequence split within the same thread block (#5531) Steve Luo 2024-04-18 16:45:07 +08:00
  • a0ad587c24 [shardformer] refactor embedding resize (#5603) flybird11111 2024-04-18 16:10:18 +08:00
  • 52a2dded36 [Feature] qlora support (#5586) linsj20 2024-04-17 15:03:31 +08:00
  • 3788fefc7a [zero] support multiple (partial) backward passes (#5596) Hongxin Liu 2024-04-16 17:49:21 +08:00
  • 89049b0d89 [doc] fix ColossalMoE readme (#5599) Camille Zhong 2024-04-15 18:06:18 +08:00