Commit Graph

  • 62c13e7969 [Ring Attention] Improve comments (#6085) Wenxuan Tan 2024-10-15 22:23:35 -05:00
  • 90939b77e0 [fix] debug zbv llama test; duanjunwen 2024-10-15 09:39:11 +00:00
  • dcd41d0973 Merge pull request #6071 from wangbluo/ring_attention Wang Binluo 2024-10-15 15:17:21 +08:00
  • 83cf2f84fb fix wangbluo 2024-10-15 14:50:27 +08:00
  • 52dcc73313 Merge branch 'feature/zerobubble' of github.com:hpcaitech/ColossalAI into dev/zero_bubble duanjunwen 2024-10-15 06:31:45 +00:00
  • 9912cc8c07 [fix] fix bwd b; now bwd w only for Layer replaced by Linear1D_Col/Row; other layer perform a fully bwd; duanjunwen 2024-10-15 06:26:01 +00:00
  • bc7eeade33 fix wangbluo 2024-10-15 13:28:33 +08:00
  • fd92789af2 fix wangbluo 2024-10-15 13:26:44 +08:00
  • 6be9862aaf fix wangbluo 2024-10-15 11:56:49 +08:00
  • 3dc08c8a5a fix wangbluo 2024-10-15 11:01:34 +08:00
  • 8ff7d0c780 fix wangbluo 2024-10-14 18:16:03 +08:00
  • fe9208feac fix wangbluo 2024-10-14 18:07:56 +08:00
  • 3201377e94 fix wangbluo 2024-10-14 18:06:24 +08:00
  • 23199e34cc fix wangbluo 2024-10-14 18:01:53 +08:00
  • 160e9a4175 [feat]EPMixtralSparseMoeBlock (op in MOE) support zbv; duanjunwen 2024-10-14 08:22:51 +00:00
  • abd455189d [fix] fix test case; moe error in second iter duanjunwen 2024-10-14 07:38:02 +00:00
  • a11b4b50a7 [feat] support use_zbv in llama, mixtral modeling; only replace Linear1D_Col/Row policy; duanjunwen 2024-10-14 07:12:14 +00:00
  • cfade4c36d [feat] Linear1D_COL/ROW support zbv WeightGradStore; duanjunwen 2024-10-14 07:02:43 +00:00
  • d891e50617 fix wangbluo 2024-10-14 14:56:05 +08:00
  • e1e86f9f1f fix wangbluo 2024-10-14 11:45:35 +08:00
  • 4c8e85ee0d [Coati] Train DPO using PP (#6054) Tong Li 2024-10-11 19:32:00 +08:00
  • 703bb5c18d fix the test wangbluo 2024-10-11 17:34:20 +08:00
  • 4e0e99bb6a fix the test wangbluo 2024-10-11 17:31:40 +08:00
  • 0ca16d5cbe [fix] fix llama, mixtral benchmark zbv loss none bug; update mixtral & llama policy and modeling; duanjunwen 2024-10-11 07:32:43 +00:00
  • 1507a7528f fix wangbluo 2024-10-11 06:20:34 +00:00
  • 0002ae5956 fix wangbluo 2024-10-11 14:16:21 +08:00
  • dac0e07b13 [zero bubble] support zero (#6080) flybird11111 2024-10-11 14:14:05 +08:00
  • dc2cdaf3e8 [shardformer] optimize seq parallelism (#6086) Hongxin Liu 2024-10-11 13:44:40 +08:00
  • efe3042bb2 fix wangbluo 2024-10-10 18:38:47 +08:00
  • 6b2c506fc5 Update README.md (#6087) 梁爽 2024-10-10 17:02:49 +08:00
  • bcbd311bc3 Update README.md supercooledith-patch-1 梁爽 2024-10-10 16:52:55 +08:00
  • 5ecc27e150 fix wangbluo 2024-10-10 15:35:52 +08:00
  • f98384aef6 fix wangbluo 2024-10-10 15:17:06 +08:00
  • e234dfa236 [feat] support MixtralPipelineForwards--> mixtral_for_causal_lm_forward for zbv duanjunwen 2024-10-10 06:57:35 +00:00
  • 646b3c5a90 [shardformer] fix linear 1d row and support uneven splits for fused qkv linear (#6084) Hongxin Liu 2024-10-10 14:34:45 +08:00
  • 72b507a7be [feat] update MixtralPipelineForwards --> mixtral_model_forward; support zbv; duanjunwen 2024-10-10 06:19:51 +00:00
  • 9ee80fc828 [fix] MixtralForCausalLMPolicy get_held_layer support zbv; duanjunwen 2024-10-10 05:40:22 +00:00
  • b635dd0669 fix wangbluo 2024-10-09 14:05:26 +08:00
  • 3f5bec8dc4 [feat] support zbv in mixtral benchmark; duanjunwen 2024-10-09 03:58:01 +00:00
  • 3532f77b90 fix wangbluo 2024-10-09 10:57:19 +08:00
  • 531773ff54 Merge pull request #6077 from duanjunwen/dev/zero_bubble duanjunwen 2024-10-09 10:22:14 +08:00
  • cc500b3e25 [fix] fix mixtral policy; duanjunwen 2024-10-08 09:34:09 +00:00
  • 292a504bea [fix] fix mixtral policy; duanjunwen 2024-10-08 09:25:11 +00:00
  • f4d023ca6e Merge branch 'feature/zerobubble' of github.com:hpcaitech/ColossalAI into dev/zero_bubble duanjunwen 2024-10-08 08:13:17 +00:00
  • 295dd2d9fe [zerobubble] rebase main (#6075) flybird11111 2024-10-08 15:58:00 +08:00
  • 6975c50f78 [fix] fix build ci; duanjunwen 2024-09-30 02:34:54 +00:00
  • 5c8bbf63a8 [feat] update optimizer bwd; ä¸ duanjunwen 2024-09-29 09:59:41 +00:00
  • d63479553c [feat] zerobubble support moehybridplugin; duanjunwen 2024-09-29 08:33:55 +00:00
  • af6aa9ed06 [plugin] hybrid support zero bubble pipeline (#6060) flybird11111 2024-09-27 14:48:55 +08:00
  • b804fdc297 Merge pull request #6069 from duanjunwen/dev/zero_bubble duanjunwen 2024-09-27 10:34:04 +08:00
  • 1342a983b1 [fix] rm print & comments; duanjunwen 2024-09-26 11:05:27 +00:00
  • 64ceea746f [fix] remove chunk 0 stage 0 bwd b; u don't have to cal micrbatch's dx; duanjunwen 2024-09-26 10:50:44 +00:00
  • 3fab92166e fix wangbluo 2024-09-26 18:03:09 +08:00
  • bb0390c90d [fix] remove duplicate arg; rm comments; duanjunwen 2024-09-26 09:45:44 +00:00
  • c5503b0d80 [fix] fix test_pipeline_utils ci; duanjunwen 2024-09-26 07:18:16 +00:00
  • 45f17fc6cc [fix] rm comments; duanjunwen 2024-09-26 06:13:56 +00:00
  • a92e16719b [fix] fix zerobubble; support shardformer model type; duanjunwen 2024-09-26 06:11:56 +00:00
  • f4daf04270 add funding news (#6072) binmakeswell 2024-09-26 12:29:27 +08:00
  • 6705dad41b fix wangbluo 2024-09-25 19:02:21 +08:00
  • 91ed32c256 fix wangbluo 2024-09-25 19:00:38 +08:00
  • 6fb1322db1 fix wangbluo 2024-09-25 18:56:18 +08:00
  • 65c8297710 fix the attn wangbluo 2024-09-25 18:51:03 +08:00
  • cfd9eda628 fix the ring attn wangbluo 2024-09-25 18:34:29 +08:00
  • 83163fa70c [fix] fix traverse; traverse dict --> traverse tensor List; duanjunwen 2024-09-25 06:38:11 +00:00
  • fc8b016887 [fix] fix stage_indices; duanjunwen 2024-09-25 06:15:45 +00:00
  • cbaa104216 release FP8 news (#6068) binmakeswell 2024-09-25 11:57:16 +08:00
  • 8501202a35 Merge pull request #6065 from duanjunwen/dev/zero_bubble duanjunwen 2024-09-24 19:17:37 +08:00
  • 7e6f793c51 [fix] fix detach_output_obj clone; duanjunwen 2024-09-24 08:08:32 +00:00
  • 6c1e1550ae [fix] fix dumb clone; duanjunwen 2024-09-23 06:43:49 +00:00
  • a875212a42 [fix] fix ci --> oom in 4096 hidden dim; duanjunwen 2024-09-23 05:55:16 +00:00
  • c114d1429a [fix] fix detach clone release order; duanjunwen 2024-09-23 04:00:24 +00:00
  • da3220f48c [fix] fix pipeline util func deallocate --> release_tensor_data; fix bwd_b loss bwd branch; duanjunwen 2024-09-20 09:48:35 +00:00
  • 1739df423c [fix] fix fwd branch, fwd pass both micro_batch & internal_inputs' duanjunwen 2024-09-20 07:34:43 +00:00
  • b6616f544e [fix] rm comments; duanjunwen 2024-09-20 07:29:41 +00:00
  • c6d6ee39bd [fix] use tree_flatten replace dict traverse; duanjunwen 2024-09-20 07:18:49 +00:00
  • 26783776f1 [fix] fix input_tensors buffer append input_obj(dict) --> Tuple (microbatch, input_obj) , and all bwd b related cal logic; duanjunwen 2024-09-20 06:41:19 +00:00
  • 4753bf7add [fix] fix mem assert; duanjunwen 2024-09-19 08:27:47 +00:00
  • a115106f8d [fix] fix bwd w input; duanjunwen 2024-09-19 08:10:05 +00:00
  • 349272c71f [fix] updatw bwd b&w input; dict --> list[torch.Tensor] duanjunwen 2024-09-19 07:47:01 +00:00
  • 6ee9584b9a [fix] fix require_grad & deallocate call; duanjunwen 2024-09-19 05:53:03 +00:00
  • 1f5c7258aa Merge remote-tracking branch 'upstream/feature/zerobubble' into dev/zero_bubble duanjunwen 2024-09-19 03:52:13 +00:00
  • dabc2e7430 [release] update version (#6062) v0.4.4 Hongxin Liu 2024-09-19 10:45:32 +08:00
  • f9546ba0be [ColossalEval] support for vllm (#6056) Camille Zhong 2024-09-18 17:09:45 +08:00
  • af2c2f8092 [feat] add more test; duanjunwen 2024-09-18 07:51:54 +00:00
  • 3dbad102cf [fix] fix zerobubble pp for shardformer type input; duanjunwen 2024-09-18 07:14:34 +00:00
  • 4fa6b9509c [moe] add parallel strategy for shared_expert && fix test for deepseek (#6063) botbw 2024-09-18 10:09:01 +08:00
  • 63314ce4e4 Merge pull request #6064 from wangbluo/fix_attn Wang Binluo 2024-09-18 10:08:15 +08:00
  • 10e4f7da72 fix wangbluo 2024-09-16 13:45:04 +08:00
  • 37e35230ff Merge pull request #6061 from wangbluo/sp_fix Wang Binluo 2024-09-14 20:54:35 +08:00
  • 827ef3ee9a fix wangbluo 2024-09-14 10:40:35 +00:00
  • bdb125f83f [doc] FP8 training and communication document (#6050) Guangyao Zhang 2024-09-14 11:01:05 +08:00
  • f20b066c59 [fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (#6059) Guangyao Zhang 2024-09-14 10:40:01 +08:00
  • b582319273 fix wangbluo 2024-09-13 10:24:41 +00:00
  • 0ad3129cb9 fix wangbluo 2024-09-13 09:01:26 +00:00
  • 0b14a5512e fix wangbluo 2024-09-13 07:06:14 +00:00
  • 696fced0d7 [fp8] fix missing fp8_comm flag in mixtral (#6057) botbw 2024-09-13 14:30:05 +08:00
  • dc032172c3 fix wangbluo 2024-09-13 06:00:58 +00:00
  • f393867cff fix wangbluo 2024-09-13 05:24:52 +00:00
  • 6eb8832366 fix wangbluo 2024-09-13 05:06:56 +00:00
  • 683179cefd fix wangbluo 2024-09-13 03:40:56 +00:00