# The JIT

The [adaptive interpreter](interpreter.md) consists of a main loop that
executes the bytecode instructions generated by the
[bytecode compiler](compiler.md) and their
[specializations](interpreter.md#Specialization). Runtime optimization in
this interpreter can only be done for one instruction at a time. The JIT
is based on a mechanism to replace an entire sequence of bytecode instructions,
and this enables optimizations that span multiple instructions.

Historically, the adaptive interpreter was referred to as `tier 1` and
the JIT as `tier 2`. You will see remnants of this in the code.

## The Trace Recorder and Executors

There are two interpreters in this section:
  1. Adaptive interpreter (the default behavior)
  2. Trace recording interpreter (enabled on JIT builds)

The program begins running on the adaptive interpreter, until a `JUMP_BACKWARD` or
`RESUME` instruction determines that it is "hot" because the counter in its
[inline cache](interpreter.md#inline-cache-entries) indicates that it
executed more than some threshold number of times (see
[`backoff_counter_triggers`](../Include/internal/pycore_backoff.h)).
It then calls the function `_PyJit_TryInitializeTracing` in
[`Python/optimizer.c`](../Python/optimizer.c), passing it the current
[frame](frames.md), instruction pointer and state.
The interpreter then switches into "tracing mode" via the macro
`ENTER_TRACING()`. On platforms that support computed goto and tail-calling
interpreters, the dispatch table is swapped out, while other platforms that do
not support either use a single flag in the opcode.
Execution between the normal interpreter and tracing interpreter are
interleaved via this dispatch mechanism. This means that while logically
there are two interpreters, the implementation appears to be a single
interpreter. 

During tracing mode, after each interpreter instruction's `DISPATCH()`,
the interpreter jumps to the `TRACE_RECORD` instruction. This instruction
records the previous instruction executed and also any live values of the next
operation it may require. It then translates the previous instruction to
a sequence of micro-ops using `_PyJit_translate_single_bytecode_to_trace`.
To ensure that the adaptive interpreter instructions
and cache entries are up-to-date, the trace recording interpreter always resets
the adaptive counters of adaptive instructions it sees.
This forces a re-specialization of any new instruction should an instruction
deoptimize. Thus, feeding the trace recorder up-to-date information.
Finally, the `TRACE_RECORD` instruction decides when to stop tracing 
using various heuristics.

Once trace recording concludes, `LEAVE_TRACING()` swaps out the dispatch
table/the opcode flag set earlier by `ENTER_TRACING()` is unset.
`stop_tracing_and_jit()` then calls `_PyOptimizer_Optimize()` which optimizes
the trace and constructs an
[`_PyExecutorObject`](../Include/internal/pycore_optimizer.h).

JIT execution is set up
to either return to the adaptive interpreter and resume execution, or
transfer control to another executor (see `_PyExitData` in
Include/internal/pycore_optimizer.h). When resuming to the adaptive interpreter,
a "side exit", generated by an `EXIT_IF` may trigger recording of another trace.
While a "deopt", generated by a `DEOPT_IF`, does not trigger recording.

The executor is stored on the [`code object`](code_objects.md) of the frame,
in the `co_executors` field which is an array of executors. The start
instruction of the trace (the `JUMP_BACKWARD`) is replaced by an
`ENTER_EXECUTOR` instruction whose `oparg` is equal to the index of the
executor in `co_executors`.

## The micro-op optimizer

The micro-op (abbreviated `uop` to approximate `μop`) optimizer is defined in
[`Python/optimizer.c`](../Python/optimizer.c) as `_PyOptimizer_Optimize`.
It takes a micro-op sequence from the trace recorder and optimizes with
`_Py_uop_analyze_and_optimize` in
[`Python/optimizer_analysis.c`](../Python/optimizer_analysis.c)
and an instance of `_PyUOpExecutor_Type` is created to contain it.

## The JIT interpreter

After a `JUMP_BACKWARD` instruction invokes the uop optimizer to create a uop
executor, it transfers control to this executor via the `TIER1_TO_TIER2` macro.

CPython implements two executors. Here we describe the JIT interpreter,
which is the simpler of them and is therefore useful for debugging and analyzing
the uops generation and optimization stages. To run it, we configure the
JIT to run on its interpreter (i.e., python is configured with
[`--enable-experimental-jit=interpreter`](https://docs.python.org/dev/using/configure.html#cmdoption-enable-experimental-jit)).

When invoked, the executor jumps to the `tier2_dispatch:` label in
[`Python/ceval.c`](../Python/ceval.c), where there is a loop that
executes the micro-ops. The body of this loop is a switch statement over
the uops IDs, resembling the one used in the adaptive interpreter.

The switch implementing the uops is in [`Python/executor_cases.c.h`](../Python/executor_cases.c.h),
which is generated by the build script
[`Tools/cases_generator/tier2_generator.py`](../Tools/cases_generator/tier2_generator.py)
from the bytecode definitions in
[`Python/bytecodes.c`](../Python/bytecodes.c).

When an `_EXIT_TRACE` or `_DEOPT` uop is reached, the uop interpreter exits
and execution returns to the adaptive interpreter.

## Invalidating Executors

In addition to being stored on the code object, each executor is also
inserted into contiguous arrays (`executor_blooms` and `executor_ptrs`)
stored in the interpreter state. These arrays are used when it is necessary
to invalidate executors because values they used in their construction may
have changed.

## The JIT

When the full jit is enabled (python was configured with
[`--enable-experimental-jit`](https://docs.python.org/dev/using/configure.html#cmdoption-enable-experimental-jit),
the uop executor's `jit_code` field is populated with a pointer to a compiled
C function that implements the executor logic. This function's signature is
defined by `jit_func` in [`pycore_jit.h`](../Include/internal/pycore_jit.h).
When the executor is invoked by `ENTER_EXECUTOR`, instead of jumping to
the uop interpreter at `tier2_dispatch`, the executor runs the function
that `jit_code` points to. This function returns the instruction pointer
of the next Tier 1 instruction that needs to execute.

The generation of the jitted functions uses the copy-and-patch technique
which is described in
[Haoran Xu's article](https://sillycross.github.io/2023/05/12/2023-05-12/).
At its core are statically generated `stencils` for the implementation
of the micro ops, which are completed with runtime information while
the jitted code is constructed for an executor by
[`_PyJIT_Compile`](../Python/jit.c).

The stencils are generated at build time under the Makefile target `regen-jit`
by the scripts in [`/Tools/jit`](/Tools/jit). This script reads
[`Python/executor_cases.c.h`](../Python/executor_cases.c.h) (which is
generated from [`Python/bytecodes.c`](../Python/bytecodes.c)). For
each opcode, it constructs a `.c` file that contains a function for
implementing this opcode, with some runtime information injected.
This is done by replacing `CASE` by the bytecode definition in the
template file [`Tools/jit/template.c`](../Tools/jit/template.c).

Each of the `.c` files is compiled by LLVM, to produce an object file
that contains a function that executes the opcode. These compiled
functions are used to generate the file
[`jit_stencils.h`](../jit_stencils.h), which contains the functions
that the JIT can use to emit code for each of the bytecodes.

For Python maintainers this means that changes to the bytecodes and
their implementations do not require changes related to the stencils,
because everything is automatically generated from
[`Python/bytecodes.c`](../Python/bytecodes.c) at build time.

See Also:

* [Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode](https://arxiv.org/abs/2011.13127)

* [PyCon 2024: Building a JIT compiler for CPython](https://www.youtube.com/watch?v=kMO3Ju0QCDo)