Commit Graph

  • 68f55a709c [hotfix] fix stable diffusion inference bug. (#5289) Youngon 2024-03-05 22:03:40 +08:00
  • c8003d463b [doc] Fix typo s/infered/inferred/ (#5288) hugo-syn 2024-03-05 15:02:08 +01:00
  • 5e1c93d732 [hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) digger yu 2024-03-05 21:52:30 +08:00
  • a7ae2b5b4c [eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) Dongruixuan Li 2024-03-05 08:48:55 -05:00
  • 049121d19d [hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) digger yu 2024-03-05 21:48:46 +08:00
  • 16c96d4d8c [hotfix] fix typo change _descrption to _description (#5331) digger yu 2024-03-05 21:47:48 +08:00
  • 70cce5cbed [doc] update some translations with README-zh-Hans.md (#5382) digger yu 2024-03-05 21:45:55 +08:00
  • e239cf9060 [hotfix] fix typo of openmoe model source (#5403) Luo Yihang 2024-03-05 21:44:38 +08:00
  • e304e4db35 [hotfix] fix sd vit import error (#5420) MickeyCHAN 2024-03-05 21:41:23 +08:00
  • 070df689e6 [devops] fix extention building (#5427) Hongxin Liu 2024-03-05 15:35:54 +08:00
  • 822241a99c [doc] sora release (#5425) binmakeswell 2024-03-05 12:08:58 +08:00
  • 29695cf70c [example]add gpt2 benchmark example script. (#5295) flybird11111 2024-03-04 16:18:13 +08:00
  • 593a72e4d5 Merge pull request #5424 from FrankLeeeee/sync/main Frank Lee 2024-03-04 10:13:59 +08:00
  • 0310b76e9d Merge branch 'main' into sync/main FrankLeeeee 2024-03-04 10:09:36 +08:00
  • 4b8312c08e fix sft single turn inference example (#5416) Camille Zhong 2024-03-01 17:27:50 +08:00
  • a1c6cdb189 [doc] fix blog link binmakeswell 2024-02-29 14:52:30 +08:00
  • 5de940de32 [doc] fix blog link binmakeswell 2024-02-29 14:51:29 +08:00
  • 2461f37886 [workflow] added pypi channel (#5412) Frank Lee 2024-02-29 13:56:55 +08:00
  • a28c971516 update requirements (#5407) Tong Li 2024-02-28 17:46:27 +08:00
  • 0aa27f1961 [Inference]Move benchmark-related code to the example directory. (#5408) yuehuayingxueluo 2024-02-28 16:46:03 +08:00
  • 600881a8ea [Inference]Add CUDA KVCache Kernel (#5406) yuehuayingxueluo 2024-02-28 14:36:50 +08:00
  • 0a25e16e46 [shardformer]gather llama logits (#5398) flybird11111 2024-02-27 22:44:07 +08:00
  • dcdd8a5ef7 [setup] fixed nightly release (#5388) Frank Lee 2024-02-27 15:19:13 +08:00
  • bf34c6fef6 [fsdp] impl save/load shard model/optimizer (#5357) QinLuo 2024-02-27 13:51:14 +08:00
  • d882d18c65 [example] reuse flash attn patch (#5400) Hongxin Liu 2024-02-27 11:22:07 +08:00
  • 95c21e3950 [extension] hotfix jit extension setup (#5402) Hongxin Liu 2024-02-26 19:46:58 +08:00
  • 19061188c3 [Infer/Fix] Fix Dependency in test - RMSNorm kernel (#5399) Yuanheng Zhao 2024-02-26 16:17:47 +08:00
  • bc1da87366 [Fix/Inference] Fix format of input prompts and input model in inference engine (#5395) yuehuayingxueluo 2024-02-23 10:51:35 +08:00
  • 2a718c8be8 Optimized the execution interval time between cuda kernels caused by view and memcopy (#5390) yuehuayingxueluo 2024-02-21 13:23:57 +08:00
  • 730103819d [Inference]Fused kv copy into rotary calculation (#5383) Jianghai 2024-02-21 11:31:48 +08:00
  • 5d380a1a21 [hotfix] Fix wrong import in meta_registry (#5392) Stephan Kölker 2024-02-20 19:24:43 +08:00
  • b833153fd5 [hotfix] fix variable type for top_p (#5313) CZYCW 2024-02-19 18:25:44 +08:00
  • b21aac5bae [Inference] Optimize and Refactor Inference Batching/Scheduling (#5367) Yuanheng Zhao 2024-02-19 17:18:20 +08:00
  • 705a62a565 [doc] updated installation command (#5389) Frank Lee 2024-02-19 16:54:03 +08:00
  • 69e3ad01ed [doc] Fix typo (#5361) yixiaoer 2024-02-19 16:53:28 +08:00
  • 7303801854 [llama] fix training and inference scripts (#5384) Hongxin Liu 2024-02-19 16:41:04 +08:00
  • adae123df3 [release] update version (#5380) v0.3.5 Hongxin Liu 2024-02-08 18:50:09 +08:00
  • efef43b53c Merge pull request #5372 from hpcaitech/exp/mixtral Frank Lee 2024-02-08 16:30:05 +08:00
  • 8c69debdc7 [Inference]Support vllm testing in benchmark scripts (#5379) yuehuayingxueluo 2024-02-08 15:27:26 +08:00
  • 4c03347fc7 Merge pull request #5377 from hpcaitech/example/llama-npu Frank Lee 2024-02-08 14:12:11 +08:00
  • 9afa52061f [inference] refactored config (#5376) Frank Lee 2024-02-08 14:04:14 +08:00
  • 06db94fbc9 [moe] fix tests ver217 2024-02-08 12:46:37 +08:00
  • 65e5d6baa5 [moe] fix mixtral optim checkpoint (#5344) Hongxin Liu 2024-02-01 13:33:09 +08:00
  • 956b561b54 [moe] fix mixtral forward default value (#5329) Hongxin Liu 2024-01-30 13:52:18 +08:00
  • b60be18dcc [moe] fix mixtral checkpoint io (#5314) Hongxin Liu 2024-01-27 16:06:33 +08:00
  • da39d21b71 [moe] support mixtral (#5309) Hongxin Liu 2024-01-25 15:48:46 +08:00
  • c904d2ae99 [moe] update capacity computing (#5253) Hongxin Liu 2024-01-11 16:09:38 +08:00
  • 7d8e0338a4 [moe] init mixtral impl Xuanlei Zhao 2023-12-14 17:52:05 +08:00
  • 1f8c7e7046 [Inference] User Experience: update the logic of default tokenizer and generation config. (#5337) Jianghai 2024-02-07 17:55:48 +08:00
  • 6fb4bcbb24 [Inference/opt] Fused KVCahce Memcopy (#5374) yuehuayingxueluo 2024-02-07 17:15:42 +08:00
  • 58740b5f68 [inference] added inference template (#5375) Frank Lee 2024-02-07 17:11:43 +08:00
  • 8106ede07f Revert "[Inference] Adapt to Fused rotary (#5348)" (#5373) Frank Lee 2024-02-07 14:27:04 +08:00
  • 9f4ab2eb92 [Inference] Adapt to Fused rotary (#5348) Jianghai 2024-02-07 11:36:04 +08:00
  • 35382a7fbf [Inference]Fused the gate and up proj in mlp,and optimized the autograd process. (#5365) yuehuayingxueluo 2024-02-06 19:38:25 +08:00
  • 084c91246c [llama] fix memory issue (#5371) Hongxin Liu 2024-02-06 19:02:37 +08:00
  • 1dedb57747 [Fix/Infer] Remove unused deps and revise requirements (#5341) Yuanheng Zhao 2024-02-06 17:27:45 +08:00
  • c53ddda88f [lr-scheduler] fix load state dict and add test (#5369) Hongxin Liu 2024-02-06 14:23:32 +08:00
  • eb4f2d90f9 [llama] polish training script and fix optim ckpt (#5368) Hongxin Liu 2024-02-06 11:52:17 +08:00
  • a5756a8720 [eval] update llama npu eval (#5366) Camille Zhong 2024-02-06 10:53:03 +08:00
  • 44ca61a22b [llama] fix neftune & pbar with start_step (#5364) Camille Zhong 2024-02-05 18:04:23 +08:00
  • a4cec1715b [llama] add flash attn patch for npu (#5362) Hongxin Liu 2024-02-05 16:48:34 +08:00
  • 73f9f23fc6 [llama] update training script (#5360) Hongxin Liu 2024-02-05 16:33:18 +08:00
  • 6c0fa7b9a8 [llama] fix dataloader for hybrid parallel (#5358) Hongxin Liu 2024-02-05 15:14:56 +08:00
  • 2dd01e3a14 [gemini] fix param op hook when output is tuple (#5355) Hongxin Liu 2024-02-04 11:58:26 +08:00
  • 631862f339 [Inference]Optimize generation process of inference engine (#5356) yuehuayingxueluo 2024-02-02 15:38:21 +08:00
  • 21ad4a27f9 [Inference/opt]Optimize the mid tensor of RMS Norm (#5350) yuehuayingxueluo 2024-02-02 15:06:01 +08:00
  • 1c790c0877 [fix] remove unnecessary dp_size assert (#5351) Wenhao Chen 2024-02-02 14:40:20 +08:00
  • 027aa1043f [doc] updated inference readme (#5343) Frank Lee 2024-02-02 14:31:10 +08:00
  • e76acbb076 [inference] moved ops tests to test_infer (#5354) Frank Lee 2024-02-02 13:51:22 +08:00
  • db1a763307 [inference] removed redundancy init_batch (#5353) Frank Lee 2024-02-02 11:44:15 +08:00
  • ffffc32dc7 [checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) Hongxin Liu 2024-02-01 16:13:06 +08:00
  • 249644c23b [Inference]Repalce Attention layer and MLP layer by shardformer to optimize the weight transpose operation,add fused_qkv and fused linear_add (#5340) yuehuayingxueluo 2024-02-01 15:49:39 +08:00
  • f8e456d202 [inference] simplified config verification (#5346) Frank Lee 2024-02-01 15:31:01 +08:00
  • c5239840e6 [Chat] fix sft loss nan (#5345) YeAnbang 2024-02-01 14:25:16 +08:00
  • abd8e77ad8 [extension] fixed exception catch (#5342) Frank Lee 2024-01-31 18:09:49 +08:00
  • df0aa49585 [Inference] Kernel Fusion, fused copy kv cache into rotary embedding (#5336) Jianghai 2024-01-31 16:31:29 +08:00
  • 1336838a91 Merge pull request #5339 from FrankLeeeee/sync/merge-main Frank Lee 2024-01-31 16:29:26 +08:00
  • c565519913 merge commit FrankLeeeee 2024-01-31 10:41:47 +08:00
  • 5f98a9d68a [Infer] Optimize Blocked KVCache And Kernels Using It (#5325) Yuanheng Zhao 2024-01-30 16:06:09 +08:00
  • e8f0642f28 [Inference]Add Nopadding Llama Modeling (#5327) yuehuayingxueluo 2024-01-30 10:31:46 +08:00
  • 71321a07cf fix typo change dosen't to doesn't (#5308) digger yu 2024-01-30 09:57:38 +08:00
  • 6a3086a505 fix typo under extensions/ (#5330) digger yu 2024-01-30 09:55:16 +08:00
  • febed23288 [doc] added docs for extensions (#5324) Frank Lee 2024-01-29 17:39:23 +08:00
  • 388179f966 [tests] fix t5 test. (#5322) flybird11111 2024-01-29 17:38:46 +08:00
  • c7c104cb7c [DOC] Update inference readme (#5280) Jianghai 2024-01-29 16:21:06 +08:00
  • a6709afe66 Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api Frank Lee 2024-01-29 14:29:58 +08:00
  • 087d0cb1fc [accelerator] fixed npu api FrankLeeeee 2024-01-29 14:27:52 +08:00
  • 8823cc4831 Merge pull request #5310 from hpcaitech/feature/npu Frank Lee 2024-01-29 13:49:39 +08:00
  • 73f4dc578e [workflow] updated CI image (#5318) Frank Lee 2024-01-29 11:53:07 +08:00
  • 1f8a75d470 [Inference] Update rms norm kernel, benchmark with vLLM (#5315) Jianghai 2024-01-29 10:22:33 +08:00
  • 7ddd8b37f0 fix (#5311) Jianghai 2024-01-26 15:02:12 +08:00
  • 4f28cb43c0 [inference]Optimize the usage of the mid tensors space in flash attn (#5304) yuehuayingxueluo 2024-01-26 14:00:10 +08:00
  • 7cfed5f076 [feat] refactored extension module (#5298) Frank Lee 2024-01-25 17:01:48 +08:00
  • bce9499ed3 fix some typo (#5307) digger yu 2024-01-25 13:56:27 +08:00
  • ec912b1ba9 [NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228) 李文军 2024-01-25 13:14:48 +08:00
  • af8359c430 [hotfix] fix boundary check in batch (#5306) Yuanheng Zhao 2024-01-25 10:23:12 +08:00
  • c647e00e3c [Inference]Add fused rotary kernel and get cos cache kernel (#5302) Jianghai 2024-01-24 16:20:42 +08:00
  • 3da9993b0d [Kernel/Fix] Revise flash attention triton kernel API and add benchmark (#5301) Yuanheng Zhao 2024-01-23 17:16:02 +08:00
  • 8e606ecc7e [Inference] Benchmarking rotary embedding and add a fetch function (#5277) Jianghai 2024-01-23 12:11:53 +08:00
  • ddf879e2db fix bug for mefture (#5299) Desperado-Jia 2024-01-22 22:17:54 +08:00