SIGN IN SIGN UP
apache / airflow UNCLAIMED

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

44812 0 0 Python

perf: use load_only() in get_dag_runs eager loading to reduce data fetched per task instance (#62482)

* perf: use load_only() in eager_load_dag_run_for_validation to reduce data fetched

The get_dag_runs API endpoint was slow on large deployments because
eager_load_dag_run_for_validation() used selectinload on task_instances and
task_instances_histories without restricting which columns were fetched.
This caused SQLAlchemy to load all heavyweight columns (executor_config with
pickled data, hostname, rendered fields, etc.) for every task instance across
every DAG run in the result page — even though only dag_version_id is needed
to traverse the association proxy to DagVersion.

Add load_only(TaskInstance.dag_version_id) and
load_only(TaskInstanceHistory.dag_version_id) to the selectinload chains so
the SELECT for task instances fetches only the identity columns and the FK
needed to resolve the dag_version relationship, significantly reducing the
volume of data transferred from the database on busy deployments.

Fixes #62025

* Fix static checks

---------

Co-authored-by: pierrejeambrun <pierrejbrun@gmail.com>
L
Lakshmi Sravya committed
13af96b80868ef91ca623d35afcd76003bfbda90
Parent: f4fd68f
Committed by GitHub <noreply@github.com> on 3/6/2026, 2:25:56 PM