Commit Graph

  • d5661f0f25 [nfc] fix typo change directoty to directory (#5111) digger yu 2023-11-27 18:25:53 +08:00
  • 916459c99a [inference] Add model forward accuracy test (#5102) refactor/inference Yuanheng Zhao 2023-11-27 14:14:06 +08:00
  • 2bdf76f1f2 fix typo change lazy_iniy to lazy_init (#5099) digger yu 2023-11-24 19:15:59 +08:00
  • 68fcaa2225 remove duplicate import (#5100) Xuanlei Zhao 2023-11-23 15:15:01 +08:00
  • e53e729d8e [Feature] Add document retrieval QA (#5020) YeAnbang 2023-11-23 10:33:48 +08:00
  • 3acbf6d496 [npu] add npu support for hybrid plugin and llama (#5090) Xuanlei Zhao 2023-11-22 19:23:21 +08:00
  • f196f40a8f [inference] decouple pipeline logci for chatglm (#5098) Hongxin Liu 2023-11-22 18:26:39 +08:00
  • cb450c2861 [hotfix]fix chatglm rmsnorm (#5079) Jianghai 2023-11-22 17:59:22 +08:00
  • 67a07e6f64 [inference] decouple pipeline logci for bloom (#5097) Hongxin Liu 2023-11-22 17:49:25 +08:00
  • afe3c78d9a add lightllm rmsnorm (#5096) Cuiqing Li (李崔卿) 2023-11-22 17:05:34 +08:00
  • aae496631c [shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) flybird11111 2023-11-22 16:00:07 +08:00
  • 27e62ba0f7 [inference] decouple pp logic for llama (#5092) Hongxin Liu 2023-11-22 13:53:08 +08:00
  • 75af66cd81 [Hotfix] Fix model policy matching strategy in ShardFormer (#5064) Zhongkai Zhao 2023-11-22 11:19:39 +08:00
  • 4ccb9ded7d [gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) flybird11111 2023-11-22 11:14:25 +08:00
  • 0d482302a1 [nfc] fix typo and author name (#5089) digger yu 2023-11-22 10:39:01 +08:00
  • fd3567e089 [nfc] fix typo in docs/ (#4972) digger yu 2023-11-21 22:06:20 +08:00
  • 79c4bff452 [doc] Update the user guide and the development document in Colossal-Inference (#5086) Zhongkai Zhao 2023-11-21 18:58:04 +08:00
  • 42b2d6f3a5 [example] add vllm inference benchmark (#5080) Hongxin Liu 2023-11-21 13:28:13 +08:00
  • dce05da535 fix thrust-transform-reduce error (#5078) Jun Gao 2023-11-21 15:09:35 +08:00
  • 1cd7efc520 [inference] refactor examples and fix schedule (#5077) Hongxin Liu 2023-11-21 10:46:03 +08:00
  • 4e3959d316 [hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074) Bin Jia 2023-11-20 20:15:25 +08:00
  • 8921a73c90 [format] applied code formatting on changed files in pull request 5067 (#5072) github-actions[bot] 2023-11-20 19:46:43 +08:00
  • fb103cfd6e [inference] update examples and engine (#5073) Xu Kai 2023-11-20 19:44:52 +08:00
  • 0c7d8bebd5 [hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069) Bin Jia 2023-11-20 17:15:37 +08:00
  • e5ce4c8ea6 [npu] add npu support for gemini and zero (#5067) Hongxin Liu 2023-11-20 16:12:41 +08:00
  • 8d56c9c389 [misc] remove outdated submodule (#5070) Hongxin Liu 2023-11-20 15:27:44 +08:00
  • bce919708f [Kernels]added flash-decoidng of triton (#5063) Cuiqing Li (李崔卿) 2023-11-20 13:58:29 +08:00
  • fd6482ad8c [inference] Refactor inference architecture (#5057) Xu Kai 2023-11-19 21:05:05 +08:00
  • bc09b95f50 [exampe] fix llama example' loss error when using gemini plugin (#5060) flybird11111 2023-11-18 18:41:58 +08:00
  • 3c08f17348 [hotfix]: modify create_ep_hierarchical_group and add test (#5032) Wenhao Chen 2023-11-17 10:53:00 +08:00
  • 97cd0cd559 [shardformer] fix llama error when transformers upgraded. (#5055) flybird11111 2023-11-16 21:34:04 +08:00
  • 3e02154710 [gemini] gemini support extra-dp (#5043) flybird11111 2023-11-16 21:03:04 +08:00
  • b2ad0d9e8f [pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when strict=False, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017) Elsa Granger 2023-11-16 20:15:59 +08:00
  • 28052a71fb [Kernels]Update triton kernels into 2.1.0 (#5046) Cuiqing Li (李崔卿) 2023-11-16 16:43:15 +08:00
  • 20332a7a34 [inference] udpate example (#5053) feature/inference-refactor Xu Kai 2023-11-16 11:07:43 +08:00
  • 5446fb70c4 [inference] update readme (#5051) Xu Kai 2023-11-16 09:10:57 +08:00
  • 361cf63cb0 [Refactor] refactor policy search and quant type controlling in inference (#5035) Zhongkai Zhao 2023-11-14 17:26:59 +08:00
  • 43ad0d9ef0 fix wrong EOS token in ColossalChat Orion-Zheng 2023-11-14 09:58:00 +08:00
  • c6295c3381 [Refactor] remove useless inference code (#5022) Xu Kai 2023-11-10 14:47:06 +08:00
  • 70885d707d [hotfix] Suport extra_kwargs in ShardConfig (#5031) Zhongkai Zhao 2023-11-10 10:49:50 +08:00
  • 576a2f7b10 [gemini] gemini support tensor parallelism. (#4942) flybird11111 2023-11-10 10:15:16 +08:00
  • a4489384d5 [shardformer] Fix serialization error with Tensor Parallel state saving (#5018) Jun Gao 2023-11-09 17:00:25 +08:00
  • 81b8f5e76a [Inference Refactor] Merge chatglm2 with pp and tp (#5023) Bin Jia 2023-11-09 14:46:19 +08:00
  • 724441279b [moe]: fix ep/tp tests, add hierarchical all2all (#4982) Wenhao Chen 2023-11-09 14:31:00 +08:00
  • 239cd92eff Support mtbench (#5025) Yuanchen 2023-11-09 13:41:50 +08:00
  • 450115bd0f [refactor] refactor gptq and smoothquant llama (#5012) Xu Kai 2023-11-08 09:17:52 +08:00
  • 48d0a58d10 add support for bloom (#5008) Bin Jia 2023-11-06 09:35:33 +08:00
  • f747d13040 [inference] support only TP (#4998) Xu Kai 2023-11-01 16:33:30 +08:00
  • f71e63b0f3 [moe] support optimizer checkpoint (#5015) Xuanlei Zhao 2023-11-08 23:07:03 +08:00
  • 67f5331754 [misc] add code owners (#5024) Hongxin Liu 2023-11-08 15:18:51 +08:00
  • ef4c14a5e2 [Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014) Jianghai 2023-11-07 15:01:50 +08:00
  • c36e782d80 [format] applied code formatting on changed files in pull request 4926 (#5007) github-actions[bot] 2023-11-06 17:08:12 +08:00
  • 1a3315e336 [hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) littsk 2023-11-03 13:32:43 +08:00
  • d99b2c961a [hotfix] fix grad accumulation plus clipping for gemini (#5002) Baizhou Zhang 2023-11-02 17:59:10 +08:00
  • dc003c304c [moe] merge moe into main (#4978) Xuanlei Zhao 2023-11-02 10:21:24 +08:00
  • 8993c8a817 [release] update version (#4995) v0.3.4 Hongxin Liu 2023-11-01 13:41:22 +08:00
  • b6696beb04 [Pipeline Inference] Merge pp with tp (#4993) Bin Jia 2023-11-01 12:46:21 +08:00
  • 335cb105e2 [doc] add supported feature diagram for hybrid parallel plugin (#4996) ppt0011 2023-10-31 19:56:42 +08:00
  • c5fd4aa6e8 [lora] add lora APIs for booster, support lora for TorchDDP (#4981) Baizhou Zhang 2023-10-31 15:19:37 +08:00
  • c040d70aa0 [hotfix] fix the bug of repeatedly storing param group (#4951) Baizhou Zhang 2023-10-31 14:48:01 +08:00
  • be82b5d4ca [hotfix] Fix the bug where process groups were not being properly released. (#4940) littsk 2023-10-31 14:47:30 +08:00
  • 4f0234f236 [doc]Update doc for colossal-inference (#4989) Cuiqing Li (李崔卿) 2023-10-31 10:48:07 +08:00
  • abe071b663 fix ColossalEval (#4992) Yuanchen 2023-10-31 10:30:03 +08:00
  • 459a88c806 [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965) Cuiqing Li 2023-10-30 14:04:37 +08:00
  • cf579ff46d [Inference] Dynamic Batching Inference, online and offline (#4953) Jianghai 2023-10-30 10:52:19 +08:00
  • 4e4a10c97d updated c++17 compiler flags (#4983) アマデウス 2023-10-27 18:19:56 +08:00
  • 1db6727678 [Pipeline inference] Combine kvcache with pipeline inference (#4938) Bin Jia 2023-10-27 16:19:54 +08:00
  • 8b1b237a5f 1st try on restore bert's cache during testing hotfix/example_test Orion-Zheng 2023-10-26 10:49:47 +08:00
  • 65f8d8b5bb 1st try on cache bert weight Orion-Zheng 2023-10-26 00:43:37 +08:00
  • c6cd629e7a [Inference]ADD Bench Chatglm2 script (#4963) Jianghai 2023-10-24 13:11:15 +08:00
  • 785802e809 [inference] add reference and fix some bugs (#4937) Xu Kai 2023-10-20 13:39:34 +08:00
  • b8e770c832 [test] merge old components to test to model zoo (#4945) Hongxin Liu 2023-10-20 10:35:08 +08:00
  • 3a41e8304e [Refactor] Integrated some lightllm kernels into token-attention (#4946) Cuiqing Li 2023-10-19 22:22:47 +08:00
  • 11009103be [nfc] fix some typo with colossalai/ docs/ etc. (#4920) digger yu 2023-10-18 15:44:04 +08:00
  • 486d06a2d5 [format] applied code formatting on changed files in pull request 4820 (#4886) github-actions[bot] 2023-10-18 11:46:37 +08:00
  • c7aa319ba0 [test] add no master test for low level zero plugin (#4934) Zhongkai Zhao 2023-10-18 11:41:23 +08:00
  • 1f5d2e8062 [hotfix] fix torch 2.0 compatibility (#4936) Hongxin Liu 2023-10-18 11:05:25 +08:00
  • 21ba89cab6 [gemini] support gradient accumulation (#4869) Baizhou Zhang 2023-10-17 14:07:21 +08:00
  • a41cf88e9b [format] applied code formatting on changed files in pull request 4908 (#4918) github-actions[bot] 2023-10-17 10:48:24 +08:00
  • 4f68b3f10c [kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921) Hongxin Liu 2023-10-16 21:56:53 +08:00
  • 7768afbad0 Update flash_attention_patch.py Zian(Andy) Zheng 2023-10-13 16:46:33 +08:00
  • 611a5a80ca [inference] Add smmoothquant for llama (#4904) Xu Kai 2023-10-16 11:28:44 +08:00
  • a0684e7bd6 [feature] support no master weights option for low level zero plugin (#4816) Zhongkai Zhao 2023-10-13 15:57:45 +08:00
  • 77a9328304 [inference] add llama2 support (#4898) Xu Kai 2023-10-13 13:09:23 +08:00
  • 39f2582e98 [hotfix] fix lr scheduler bug in torch 2.0 (#4864) Baizhou Zhang 2023-10-12 14:04:24 +08:00
  • 83b52c56cd [feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837) littsk 2023-10-12 11:32:37 +08:00
  • df63564184 [gemini] support amp o3 for gemini (#4872) Hongxin Liu 2023-10-12 10:39:08 +08:00
  • c1fab951e7 Merge pull request #4889 from ppt0011/main ppt0011 2023-10-12 10:27:10 +08:00
  • ffd9a3cbc9 [hotfix] fix bug in sequence parallel test (#4887) littsk 2023-10-11 19:30:41 +08:00
  • 1dcaf249bd [doc] add reminder for issue encountered with hybrid adam ppt0011 2023-10-11 17:48:21 +08:00
  • fdec650bb4 fix test llama (#4884) Xu Kai 2023-10-11 17:43:01 +08:00
  • 08a9f76b2f [Pipeline Inference] Sync pipeline inference branch to main (#4820) Bin Jia 2023-10-11 11:40:06 +08:00
  • 652adc2215 Update README.md Camille Zhong 2023-10-10 15:52:18 +08:00
  • afe10a85fd Update README.md Camille Zhong 2023-10-10 15:18:13 +08:00
  • d6c4b9b370 Update main README.md Camille Zhong 2023-10-10 15:13:09 +08:00
  • 3043d5d676 Update modelscope link in README.md Camille Zhong 2023-10-10 15:01:33 +08:00
  • 6a21f96a87 [doc] update advanced tutorials, training gpt with hybrid parallelism (#4866) flybird11111 2023-10-10 16:18:55 +08:00
  • 8aed02b957 [nfc] fix minor typo in README (#4846) Blagoy Simandoff 2023-10-07 10:51:11 +01:00
  • cd6a962e66 [NFC] polish code style (#4799) Camille Zhong 2023-09-27 10:42:11 +08:00
  • 07ed155e86 [NFC] polish colossalai/inference/quant/gptq/cai_gptq/__init__.py code style (#4792) Michelle 2023-09-27 10:33:13 +08:00