Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
[core] Fix negative RUNNING task metric (#62070)
Issue: `RUNNING` metric could be reported with a negative value. This is because `SetMetricStatus(GETTING_AND_PINNING_ARGS)` increments the `pending_getting_and_pinning_args_fetch_counter_` BEFORE `MovePendingToRunning()` increments the `kRunning` counter. If a metric flush happens in between, the RUNNING gauge = `running_total(0) - num_getting_pinning_args(1) = -1`. Gauge calculation logic can be found in constructor of `TaskCounter` in `core_worker.cc`. This can reproduced by adding a sufficient sleep in CoreWorker::ExecuteTask between SetMetricStatus(GETTING_AND_PINNING_ARGS) and MovePendingToRunning() to ensure metric flush takes place. <img width="1159" height="716" alt="image" src="https://github.com/user-attachments/assets/a29e169b-f166-4e8d-8f69-e26a3de4212f" /> Fix: Since GETTING_AND_PINNING_ARGS is treated as a substate for kRunning, shift the MovePendingToRunning() call to before SetMetricStatus(GETTING_AND_PINNING_ARGS). --------- Signed-off-by: Kartica Modi <karticamodi@gmail.com>
K
Kartica Modi committed
40df4994e3ca4c90da8b925a5366583a42f6c298
Parent: 463275d
Committed by GitHub <noreply@github.com>
on 3/26/2026, 4:37:23 PM