gh-130861: Add clarification to the perf docs on optimization levels (#131098)

This commit is contained in:
Pablo Galindo Salgado 2025-04-18 14:42:20 +01:00 committed by GitHub
parent b9f0943c1e
commit d134bd272f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -254,13 +254,28 @@ files in the current directory which are ELF images for all the JIT trampolines
that were created by Python.
.. warning::
Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take
When using ``--call-graph dwarf``, the ``perf`` tool will take
snapshots of the stack of the process being profiled and save the
information in the ``perf.data`` file. By default the size of the stack dump
is 8192 bytes but the user can change the size by passing the size after
comma like ``--call-graph dwarf,4096``. The size of the stack dump is
important because if the size is too small ``perf`` will not be able to
unwind the stack and the output will be incomplete. On the other hand, if
the size is too big, then ``perf`` won't be able to sample the process as
frequently as it would like as the overhead will be higher.
information in the ``perf.data`` file. By default, the size of the stack dump
is 8192 bytes, but you can change the size by passing it after
a comma like ``--call-graph dwarf,16384``.
The size of the stack dump is important because if the size is too small
``perf`` will not be able to unwind the stack and the output will be
incomplete. On the other hand, if the size is too big, then ``perf`` won't
be able to sample the process as frequently as it would like as the overhead
will be higher.
The stack size is particularly important when profiling Python code compiled
with low optimization levels (like ``-O0``), as these builds tend to have
larger stack frames. If you are compiling Python with ``-O0`` and not seeing
Python functions in your profiling output, try increasing the stack dump
size to 65528 bytes (the maximum)::
$ perf record -F 9999 -g -k 1 --call-graph dwarf,65528 -o perf.data python -Xperf_jit my_script.py
Different compilation flags can significantly impact stack sizes:
- Builds with ``-O0`` typically have much larger stack frames than those with ``-O1`` or higher
- Adding optimizations (``-O1``, ``-O2``, etc.) typically reduces stack size
- Frame pointers (``-fno-omit-frame-pointer``) generally provide more reliable stack unwinding