Commit Graph

  • 6a56967855 [doc] add llama2-13B disyplay (#5285) Desperado-Jia 2024-01-19 16:04:08 +08:00
  • 6e487e7d3c [kernel/fix] Performance Optimization for Decoding Kernel and Benchmarking (#5274) Yuanheng Zhao 2024-01-19 15:47:16 +08:00
  • 9e2342bde2 [Hotfix] Fix bugs in testing continuous batching (#5270) Jianghai 2024-01-18 16:31:14 +08:00
  • 32cb74493a fix auto loading gpt2 tokenizer (#5279) Michelle 2024-01-18 14:08:29 +08:00
  • d66e6988bc Merge pull request #5278 from ver217/sync/npu Frank Lee 2024-01-18 13:11:45 +08:00
  • 148469348a Merge branch 'main' into sync/npu ver217 2024-01-18 12:05:21 +08:00
  • 5ae9099f92 [kernel] Add RMSLayerNorm triton kernel (#5262) Yaozheng Fang 2024-01-18 10:21:03 +08:00
  • 5d9a0ae75b [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) Zhongkai Zhao 2024-01-17 17:42:29 +08:00
  • 86b63f720c [Inference]Adapted to the triton attn kernels (#5264) yuehuayingxueluo 2024-01-17 16:03:10 +08:00
  • 46e091651b [shardformer] hybridparallelplugin support gradients accumulation. (#5246) flybird11111 2024-01-17 15:22:33 +08:00
  • 2a0558d8ec [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) flybird11111 2024-01-17 13:38:55 +08:00
  • d69cd2eb89 [workflow] fixed oom tests (#5275) Frank Lee 2024-01-16 18:55:13 +08:00
  • 0f2b46a41c [kernel] Revise KVCache copy triton kernel API (#5273) Yuanheng Zhao 2024-01-16 14:41:02 +08:00
  • 04244aaaf1 [workflow] fixed incomplete bash command (#5272) Frank Lee 2024-01-16 11:54:44 +08:00
  • d8db500efc [Inference] Fix request handler and add recycle logic (#5260) Jianghai 2024-01-15 17:50:46 +08:00
  • c597678da4 [doc] updated inference readme (#5269) Frank Lee 2024-01-15 17:37:41 +08:00
  • fa85e02b3b [kernel] Add KV cache copy kernel during decoding (#5261) Yuanheng Zhao 2024-01-15 17:37:20 +08:00
  • ef4f0ee854 [hotfix]: add pp sanity check and fix mbs arg (#5268) Wenhao Chen 2024-01-15 15:57:40 +08:00
  • 1ded7e81ef [git] fixed rebased files FrankLeeeee 2024-01-11 13:50:45 +00:00
  • 1513f20f4d [kernel] Add flash decoding triton kernel for blocked kv cache (#5249) Yuanheng Zhao 2024-01-11 18:06:39 +08:00
  • fded91d049 [Inference] Kernel: no pad rotary embedding (#5252) Jianghai 2024-01-11 16:24:54 +08:00
  • d40eb26029 fix bugs in request_handler.py and engine.py yuehuayingxueluo 2024-01-10 10:38:53 +08:00
  • 10e3c9f923 rm torch.cuda.synchronize yuehuayingxueluo 2024-01-09 15:53:04 +08:00
  • fab294c7f4 fix CI bugs yuehuayingxueluo 2024-01-09 15:18:28 +08:00
  • 2a73e828eb fix bugs related to processing padding mask yuehuayingxueluo 2024-01-09 14:29:45 +08:00
  • e545a871b8 [Hotfix] Fix accuracy and align attention method api with Triton kernel (#5229) Jianghai 2024-01-08 15:56:00 +08:00
  • fa4fbdbffb adapted to pad_context_forward yuehuayingxueluo 2024-01-09 13:52:53 +08:00
  • 47e53eaa1c fix bugs in attention.py and request_handler.py yuehuayingxueluo 2024-01-08 12:35:06 +08:00
  • bfd9b1b494 [Inference] Pytorch Attention func, pad&nopad input support (#5219) Jianghai 2024-01-04 16:39:00 +08:00
  • 3ad1f3b78b fix beam_width yuehuayingxueluo 2024-01-04 16:48:53 +08:00
  • b2eb9cd186 Fixed a typo yuehuayingxueluo 2024-01-04 15:09:06 +08:00
  • bbfebfb9fc fix bugs in sampler yuehuayingxueluo 2024-01-04 15:03:18 +08:00
  • 02c1bf8b2a add context_attention_unpadded yuehuayingxueluo 2024-01-03 18:50:26 +08:00
  • 07b5283b6a [kernel] Add triton kernel for context attention (FAv2) without padding (#5192) Yuanheng Zhao 2024-01-03 14:41:35 +08:00
  • 4df8876fca Fixed a writing error yuehuayingxueluo 2024-01-02 18:34:19 +08:00
  • 9489dc64d8 precision alignment yuehuayingxueluo 2024-01-02 18:30:11 +08:00
  • 62968588d1 fix bugs in request_handler yuehuayingxueluo 2024-01-02 13:02:20 +08:00
  • 62fd08ee44 Fixed a bug in the inference frame yuehuayingxueluo 2023-12-26 21:34:27 +08:00
  • 86853a37d5 Add padding llama model yuehuayingxueluo 2023-12-25 14:07:43 +08:00
  • 0e616462a7 [Inference] add logit processor and request handler (#5166) Jianghai 2023-12-25 12:15:15 +08:00
  • 8daee26989 [Inference] Add the logic of the inference engine (#5173) yuehuayingxueluo 2023-12-18 10:40:47 +08:00
  • 93aeacca34 [Inference]Update inference config and fix test (#5178) Jianghai 2023-12-12 17:22:41 +08:00
  • 3de2e62299 [Inference] Add CacheBlock and KV-Cache Manager (#5156) Yuanheng Zhao 2023-12-11 10:56:18 +08:00
  • fab9b931d9 [Inference]Add BatchInferState, Sequence and InferConfig (#5149) yuehuayingxueluo 2023-12-07 14:34:01 +08:00
  • 2bb92243d4 [Inference/NFC] Clean outdated inference tests and deprecated kernels (#5159) Yuanheng Zhao 2023-12-05 15:12:57 +08:00
  • 56e75eeb06 [Inference] Add readme (roadmap) and fulfill request handler (#5147) Jianghai 2023-12-01 17:31:31 +08:00
  • 4cf4682e70 [Inference] First PR for rebuild colossal-infer (#5143) Jianghai 2023-12-01 17:02:44 +08:00
  • c174c4fc5f [doc] fix doc typo (#5256) binmakeswell 2024-01-11 21:01:11 +08:00
  • e830ef917d [ci] fix shardformer tests. (#5255) flybird11111 2024-01-11 19:07:45 +08:00
  • 756c400ad2 fix typo in applications/ColossalEval/README.md (#5250) digger yu 2024-01-11 17:58:38 +08:00
  • 2b83418719 [ci] fixed ddp test (#5254) Frank Lee 2024-01-11 17:16:32 +08:00
  • 3942218618 remove useless platform args and comment hotfix/kernel_build_before_load wangbinluo 2024-01-11 08:21:53 +00:00
  • d5eeeb1416 [ci] fixed booster test (#5251) Frank Lee 2024-01-11 16:04:45 +08:00
  • edf94a35c3 [workflow] fixed build CI (#5240) Frank Lee 2024-01-10 22:34:16 +08:00
  • 41e52c1c6e [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) digger yu 2024-01-10 19:24:56 +08:00
  • a9b5ec8664 fix the build before load bug wangbinluo 2024-01-10 14:50:40 +08:00
  • 9102d655ab [hotfix] removed unused flag (#5242) Frank Lee 2024-01-09 14:57:07 +08:00
  • d202cc28c0 [npu] change device to accelerator api (#5239) Hongxin Liu 2024-01-09 10:20:05 +08:00
  • d565df3821 [pipeline] A more general _communicate in p2p (#5062) Elsa Granger 2024-01-08 15:37:27 +08:00
  • dd2c28a323 [npu] use extension for op builder (#5172) Xuanlei Zhao 2024-01-08 11:39:16 +08:00
  • 7bc6969ce6 [doc] SwiftInfer release (#5236) binmakeswell 2024-01-08 09:55:12 +08:00
  • 4fb4a22a72 [format] applied code formatting on changed files in pull request 5234 (#5235) github-actions[bot] 2024-01-07 20:55:34 +08:00
  • b9b32b15e6 [doc] add Colossal-LLaMA-2-13B (#5234) binmakeswell 2024-01-07 20:53:12 +08:00
  • ce651270f1 [doc] Make leaderboard format more uniform and good-looking (#5231) JIMMY ZHAO 2024-01-06 04:12:29 -05:00
  • 915b4652f3 [doc] Update README.md of Colossal-LLAMA2 (#5233) Camille Zhong 2024-01-06 17:06:41 +08:00
  • d992b55968 [Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224) Tong Li 2024-01-05 17:24:26 +08:00
  • b0b53a171c [nfc] fix typo colossalai/shardformer/ (#5133) digger yu 2024-01-04 16:21:55 +08:00
  • 451e9142b8 fix flash attn (#5209) flybird11111 2024-01-03 14:39:53 +08:00
  • 365671be10 fix-test (#5210) flybird11111 2024-01-03 14:26:13 +08:00
  • 7f3400b560 [devops] update torch versoin in ci (#5217) Hongxin Liu 2024-01-03 11:46:33 +08:00
  • d799a3088f [pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) Wenhao Chen 2024-01-03 11:34:49 +08:00
  • 3c0d82b19b [pipeline]: support arbitrary batch size in forward_only mode (#5201) Wenhao Chen 2024-01-02 23:41:12 +08:00
  • 02d2328a04 support linear accumulation fusion (#5199) flybird11111 2023-12-29 18:22:42 +08:00
  • 64519eb830 [doc] Update required third-party library list for testing and torch comptibility checking (#5207) Zhongkai Zhao 2023-12-27 18:03:45 +08:00
  • eae01b6740 Improve logic for selecting metrics (#5196) Yuanchen 2023-12-22 14:52:50 +08:00
  • 4fa689fca1 [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) Wenhao Chen 2023-12-22 10:44:00 +08:00
  • cabc1286ca [LowLevelZero] low level zero support lora (#5153) flybird11111 2023-12-21 17:01:01 +08:00
  • af952673f7 polish readme in application/chat (#5194) BlueRum 2023-12-20 11:28:39 +08:00
  • 681d9b12ef [doc] update pytorch version in documents. (#5177) flybird11111 2023-12-15 18:16:48 +08:00
  • 3ff60d13b0 Fix ColossalEval (#5186) Yuanchen 2023-12-15 15:06:06 +08:00
  • 79718fae04 [shardformer] llama support DistCrossEntropy (#5176) flybird11111 2023-12-13 01:39:14 +08:00
  • cefdc32615 [ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169) Yuanchen 2023-12-12 14:47:35 +08:00
  • b07a6f4e27 [colossalqa] fix pangu api (#5170) Michelle 2023-12-11 14:08:11 +08:00
  • 21aa5de00b [gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) flybird11111 2023-12-08 11:10:51 +08:00
  • b397104438 [Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878) Yuanchen 2023-12-07 14:02:03 +08:00
  • 3dbbf83f1c fix (#5158) flybird11111 2023-12-05 14:28:36 +08:00
  • 368b5e3d64 [doc] fix colossalqa document (#5146) Michelle 2023-12-01 21:39:53 +08:00
  • c7fd9a5213 [ColossalQA] refactor server and webui & add new feature (#5138) Michelle 2023-11-30 22:55:52 +08:00
  • 2a2ec49aa7 [plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135) flybird11111 2023-11-30 18:37:47 +08:00
  • d6df19bae7 [npu] support triangle attention for llama (#5130) Xuanlei Zhao 2023-11-30 14:21:30 +08:00
  • f4e72c9992 [accelerator] init the accelerator module (#5129) Frank Lee 2023-11-30 13:25:17 +08:00
  • f6731db67c [format] applied code formatting on changed files in pull request 5115 (#5118) github-actions[bot] 2023-11-29 13:39:14 +08:00
  • 9b36640f28 [format] applied code formatting on changed files in pull request 5124 (#5125) github-actions[bot] 2023-11-29 13:39:02 +08:00
  • d10ee42f68 [format] applied code formatting on changed files in pull request 5088 (#5127) github-actions[bot] 2023-11-29 13:38:37 +08:00
  • 9110406a47 fix typo change JOSNL TO JSONL etc. (#5116) digger yu 2023-11-29 11:08:32 +08:00
  • 2899cfdabf [doc] updated paper citation (#5131) Frank Lee 2023-11-29 10:47:51 +08:00
  • 177c79f2d1 [doc] add moe news (#5128) binmakeswell 2023-11-28 17:44:06 +08:00
  • 7172459e74 [shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) Wenhao Chen 2023-11-28 16:54:42 +08:00
  • 126cf180bc [hotfix] fixed memory usage of shardformer module replacement (#5122) アマデウス 2023-11-28 15:38:26 +08:00
  • 7b789f4dd2 [FEATURE] Add Safety Eval Datasets to ColossalEval (#5095) Zian(Andy) Zheng 2023-11-27 18:15:13 +08:00