SIGN IN SIGN UP
ray-project / ray UNCLAIMED

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

41876 0 1 Python

[core] Fix negative RUNNING task metric (#62070)

Issue: `RUNNING` metric could be reported with a negative value. This is
because `SetMetricStatus(GETTING_AND_PINNING_ARGS)` increments the
`pending_getting_and_pinning_args_fetch_counter_` BEFORE
`MovePendingToRunning()` increments the `kRunning` counter. If a metric
flush happens in between, the RUNNING gauge = `running_total(0) -
num_getting_pinning_args(1) = -1`. Gauge calculation logic can be found
in constructor of `TaskCounter` in `core_worker.cc`.

This can reproduced by adding a sufficient sleep in
CoreWorker::ExecuteTask between
SetMetricStatus(GETTING_AND_PINNING_ARGS) and MovePendingToRunning() to
ensure metric flush takes place.
<img width="1159" height="716" alt="image"
src="https://github.com/user-attachments/assets/a29e169b-f166-4e8d-8f69-e26a3de4212f"
/>

Fix: Since GETTING_AND_PINNING_ARGS is treated as a substate for
kRunning, shift the MovePendingToRunning() call to before
SetMetricStatus(GETTING_AND_PINNING_ARGS).

---------

Signed-off-by: Kartica Modi <karticamodi@gmail.com>
K
Kartica Modi committed
40df4994e3ca4c90da8b925a5366583a42f6c298
Parent: 463275d
Committed by GitHub <noreply@github.com> on 3/26/2026, 4:37:23 PM