perf: use load_only() in get_dag_runs eager loading to reduce data fetched per task instance (#62482)
* perf: use load_only() in eager_load_dag_run_for_validation to reduce data fetched The get_dag_runs API endpoint was slow on large deployments because eager_load_dag_run_for_validation() used selectinload on task_instances and task_instances_histories without restricting which columns were fetched. This caused SQLAlchemy to load all heavyweight columns (executor_config with pickled data, hostname, rendered fields, etc.) for every task instance across every DAG run in the result page — even though only dag_version_id is needed to traverse the association proxy to DagVersion. Add load_only(TaskInstance.dag_version_id) and load_only(TaskInstanceHistory.dag_version_id) to the selectinload chains so the SELECT for task instances fetches only the identity columns and the FK needed to resolve the dag_version relationship, significantly reducing the volume of data transferred from the database on busy deployments. Fixes #62025 * Fix static checks --------- Co-authored-by: pierrejeambrun <pierrejbrun@gmail.com>
L
Lakshmi Sravya committed
13af96b80868ef91ca623d35afcd76003bfbda90
Parent: f4fd68f
Committed by GitHub <noreply@github.com>
on 3/6/2026, 2:25:56 PM