Commit Graph

  • e181318d51 [feat] Support boxed math reward (#6284) YeAnbang 2025-04-29 16:46:47 +08:00
  • fb4e507d00 fix pp+tp, fix dataloader (#6280) YeAnbang 2025-04-28 17:10:00 +08:00
  • 37a8be7651 fix save issue (#6279) Tong Li 2025-04-27 17:54:06 +08:00
  • 673682e716 fix checkpoint naming; add num_epoch parameter (#6277) YeAnbang 2025-04-26 14:00:28 +08:00
  • 5f913e8b77 [feat] Support DAPO (#6263) YeAnbang 2025-04-25 17:39:17 +08:00
  • b34d707cdc [feat] Add final save at the end (#6274) Tong Li 2025-04-23 10:03:46 +08:00
  • befd4f1487 add prompt template (#6273) Tong Li 2025-04-22 10:39:47 +08:00
  • 3bd6fa3c67 [hot-fix] Fix memory leakage bug, support TP+PP (#6258) YeAnbang 2025-04-10 10:52:18 +08:00
  • 5d79b9e692 [Distributed RLHF] Integration of PP (#6257) YeAnbang 2025-04-09 13:23:24 +08:00
  • 12da4d14aa [feat] add microbatch forwarding (#6251) YeAnbang 2025-03-28 10:24:58 +08:00
  • c627b60551 update logging YeAnbang 2025-03-21 16:12:07 +08:00
  • 23aac43dcf simplify vllm preprocessing input ids YeAnbang 2025-03-21 15:03:10 +08:00
  • 16e68a071d fix logprob, add filtering, temperature annealing, lr descent YeAnbang 2025-03-21 10:24:24 +08:00
  • f983071b10 fix vllm YeAnbang 2025-03-19 17:07:20 +08:00
  • 455185345e [Feature] Support Distributed LogProb for GRPO Training (#6247) duanjunwen 2025-03-18 17:47:55 +08:00
  • 35dabd718e fix transformers backend YeAnbang 2025-03-14 18:12:35 +08:00
  • e224673c44 setup update Tong Li 2025-03-13 16:52:15 +08:00
  • bfc45829c3 print results Tong Li 2025-03-13 16:51:22 +08:00
  • 30c7ddd9f1 convert to 8 generation Tong Li 2025-03-13 16:49:02 +08:00
  • a2ae82a417 fix consumer Tong Li 2025-03-13 14:55:26 +08:00
  • b19355f8f0 fix tp bug Tong Li 2025-03-13 14:52:09 +08:00
  • 69a1a325ee detach Tong Li 2025-03-11 16:17:02 +08:00
  • b951d0b224 add response length Tong Li 2025-03-11 13:06:09 +08:00
  • a4862a2349 fix reward score Tong Li 2025-03-11 10:17:32 +08:00
  • a537aa1c20 update reward Tong Li 2025-03-10 14:19:10 +08:00
  • c8db826782 update reward fn Tong Li 2025-03-10 14:18:22 +08:00
  • fe017d34c5 update grpo Tong Li 2025-03-10 14:12:04 +08:00
  • bc538ba049 [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2025-03-06 08:29:58 +00:00
  • f71d422690 [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2025-03-06 06:30:26 +00:00
  • 246f16d7bc update select algo Tong Li 2025-03-06 16:27:13 +08:00
  • 88eb6e5f04 add save Tong Li 2025-03-06 16:26:14 +08:00
  • 1f15dc70df add algo selection Tong Li 2025-03-06 14:29:22 +08:00
  • cc4cc78169 update loader Tong Li 2025-03-06 11:44:42 +08:00
  • 5c75d5b07c update example Tong Li 2025-03-06 10:54:23 +08:00
  • f8899dda70 update reward fn Tong Li 2025-03-06 10:53:48 +08:00
  • 9754a11398 update loss Tong Li 2025-03-06 10:53:03 +08:00
  • 5f178a7d24 grpo consumer Tong Li 2025-03-06 10:51:27 +08:00
  • b7842f8a5d modify data loader Tong Li 2025-03-06 10:49:44 +08:00
  • 718c4b76cc polish Tong Li 2025-02-28 10:16:42 +08:00
  • 1f07b716bf update grpo Tong Li 2025-02-25 18:12:04 +08:00
  • 40d601802d add simple grpo Tong Li 2025-02-23 22:54:26 +08:00
  • fa1272f9f2 add reward related function Tong Li 2025-02-23 11:02:54 +08:00
  • 7a2d455136 [feature] fit RL style generation (#6213) Hongxin Liu 2025-02-21 17:28:19 +08:00
  • 162bb42321 [chat] add distributed impl (#6210) Hongxin Liu 2025-02-21 15:24:23 +08:00
  • f067e778e9 merge grpo-latest' grpo-zero-bubble YeAnbang 2025-08-04 11:38:14 +08:00
  • cd32236e53 [Fix] Add L2 Regularization (#6372) grpo-latest YeAnbang 2025-07-29 16:56:52 +08:00
  • 6019434ac9 Merge pull request #6370 from ChosenQC/feature/pdf-rag hpc-ai-cloud Hanks 2025-07-23 14:26:08 +08:00
  • 3fdd4e7733 [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2025-07-23 06:21:29 +00:00
  • eadcad8749 [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2025-07-23 05:59:43 +00:00
  • 6070287b34 add pdf-rag example ChosenQC 2025-07-23 04:26:43 +00:00
  • 57e92104a2 hotfix entropy calculation (#6364) YeAnbang 2025-07-22 10:02:02 +08:00
  • 5c5cb1863b hotfix YeAnbang 2025-07-21 18:04:20 +08:00
  • e774edeb80 fix racing condition YeAnbang 2025-07-21 17:21:07 +08:00
  • 4cf5ce20bf add entropy (#6363) YeAnbang 2025-07-17 15:05:10 +08:00
  • f54ae56f12 add entropy YeAnbang 2025-07-16 16:44:23 +08:00
  • 9f8c97d028 add entropy grpo-latest-support-entropy-metric YeAnbang 2025-07-16 16:44:23 +08:00
  • edd65a84dd Merge pull request #6362 from hpcaitech/CI/test_build_on_schedule Hanks 2025-07-15 14:25:10 +08:00
  • f5c155ab48 Merge pull request #6361 from hpcaitech/grpo-latest-fix-code-reward YeAnbang 2025-07-14 18:25:23 +08:00
  • d850475208 fix style YeAnbang 2025-07-14 18:23:39 +08:00
  • a992da9f0f fix code evaluation YeAnbang 2025-07-14 16:25:03 +08:00
  • c5e97f4e25 fix code evaluation YeAnbang 2025-07-14 16:25:03 +08:00
  • 908c634686 [CI] disable timm_regnetv_040 as aten::_unique2 is not supproted botbw 2025-07-14 07:50:27 +00:00
  • e285eb6993 [CI] install flash-attn 2.7.4.post1 botbw 2025-07-14 02:38:02 +00:00
  • d097224d90 [feat] support qwen3 in shardformer botbw 2025-07-10 13:57:52 +08:00
  • 509274c47e add code for zero-bubble implementation YeAnbang 2025-07-09 11:21:43 +08:00
  • b1f646c7e7 [feat[ Support one-behind to reduce bubble time. Add profiling code (#6353) YeAnbang 2025-06-30 13:21:08 +08:00
  • 973dea21c7 remove assert grpo_optimization Tong Li 2025-06-27 14:16:23 +08:00
  • 90c3b12474 update assert Tong Li 2025-06-27 13:14:35 +08:00
  • a0f8680e85 fix update Tong Li 2025-06-27 09:52:50 +08:00
  • 58cb4fb4f7 add profiling Tong Li 2025-06-26 17:49:53 +08:00
  • 71ef6b32c6 fix loop issue Tong Li 2025-06-26 15:08:27 +08:00
  • 8abf186ce2 fix behind Tong Li 2025-06-26 10:27:00 +08:00
  • 9379a89677 [feat][npu] Merge form grpo-latest (#6346) xysheng-colossal 2025-06-23 11:49:13 +08:00
  • db8baeeaf2 fix visualization YeAnbang 2025-06-20 15:52:09 +08:00
  • c2561f826a fix bugs YeAnbang 2025-06-20 15:44:13 +08:00
  • ff6696a9bb support n_behind, add profiling YeAnbang 2025-06-20 03:14:00 +00:00
  • c7d3d0dc8f remove unused parameter hotfix/tp-issue Tong Li 2025-06-19 07:14:16 +00:00
  • 8880b83791 add dp rank for multi-dp (#6351) Tong Li 2025-06-19 14:02:08 +08:00
  • dd49444dcb Merge pull request #6348 from hpcaitech/grpo_optimization YeAnbang 2025-06-19 13:21:51 +08:00
  • 6b06430ca4 fix small bug YeAnbang 2025-06-19 01:37:52 +00:00
  • e3d56cbd86 implement memory efficient logprob YeAnbang 2025-06-18 10:24:48 +00:00
  • 2db255bf15 add profiling, implement memory efficient logprob alculation grpo-profile YeAnbang 2025-06-18 10:08:22 +00:00
  • 30a6859f77 optimize pp log_softmax OOM YeAnbang 2025-06-13 18:21:54 +08:00
  • ff1689b69a add profiling YeAnbang 2025-06-13 18:00:31 +08:00
  • 0e69b98c28 Merge pull request #6347 from hpcaitech/hotfix/fix_num_update_per_episode YeAnbang 2025-06-12 15:56:49 +08:00
  • 51b7abe9dd fix num_update_per_episode YeAnbang 2025-06-12 15:06:01 +08:00
  • 43a0e99ae1 add profile YeAnbang 2025-06-12 15:03:44 +08:00
  • ac069357a9 Merge pull request #6316 from hpcaitech/grpo-support-multi-machine YeAnbang 2025-06-12 11:23:06 +08:00
  • 2f02a28777 Update README.md grpo-support-multi-machine YeAnbang 2025-06-12 11:21:31 +08:00
  • 8992def757 fix pp memory issue (#6344) Tong Li 2025-06-11 17:54:18 +08:00
  • 1330b5753b add ray timeout handling instruction YeAnbang 2025-06-10 18:21:42 +08:00
  • dc29c74632 update readme YeAnbang 2025-06-10 17:17:41 +08:00
  • 25599246c5 modify readme YeAnbang 2025-06-10 17:00:35 +08:00
  • 21d517d0fa Manually schedule resources and support auto master address assigning YeAnbang 2025-06-10 15:00:48 +08:00
  • bb6f5d98fc move out evaluation func (#6343) Tong Li 2025-06-10 13:53:19 +08:00
  • c308b42f38 Merge pull request #6341 from hpcaitech/grpo-code YeAnbang 2025-06-09 17:02:13 +08:00
  • 9ca920c1af [pre-commit.ci] auto fixes from pre-commit.com hooks grpo-code pre-commit-ci[bot] 2025-06-09 01:48:19 +00:00
  • d0e12c508a remove debug code YeAnbang 2025-06-09 09:42:58 +08:00
  • 3bed6ae9ee fix bug, tested YeAnbang 2025-06-09 09:37:28 +08:00
  • dc3033e68a support code generation tasks YeAnbang 2025-06-05 17:56:42 +08:00