- Latest (GraalVM for JDK 21)
- Dev Build
- GraalVM for JDK 21
- GraalVM for JDK 20
- GraalVM for JDK 17
- GraalVM 22.3
- GraalVM 22.2
- GraalVM 22.1
- GraalVM 22.0
- GraalVM 21.3
- Java on Truffle
- LLVM Languages Reference
- Python Reference
- Jython Compatibility
- Operating System Interfaces
- Installing Packages
- Python Code Parsing and pyc Files
- Python Native Executables
- Python Standalone Applications
- Tooling Support for Python
- Ruby Reference
- GraalVM R Runtime
- WebAssembly Reference
Python Code Parsing and pyc Files
This guide describes how Python files are parsed by GraalPy.
Creating and Managing pyc Files #
GraalPy automatically creates a .pyc file when there is an invalid or absent .pyc file that matches the corresponding .py file.
When a Python source file (module) is imported during an execution for the first time, the appropriate .pyc file is created automatically. If the same module is imported again, then the existing .pyc file is used. That means that there are no .pyc files for source files that were not executed (imported) yet. The creation of .pyc files is achieved entirely through the FileSystem API, so that embedders can manage file system access.
GraalPy never deletes a .pyc file.
Every subsequent execution of a script will reuse existing .pyc files, or will generate new ones.
A .pyc file is regenerated if the timestamp or hashcode of the original source file is changed.
The hashcode is generated based only on the Python source file by calling
source.hashCode(), which is the JDK hash code over the array of source file bytes, calculated with
The .pyc files are also regenerated if a magic number in the Python parser is changed. The magic number is hard-coded in the Python source and can not be changed by the user (unless of course that user has access to the bytecode of Python).
The developers of GraalPy change the magic number when the bytecode format changes.
This is an implementation detail, so the magic number does not have to correspond to the version of GraalPy (as in CPython).
The magic number of
pyc is a function of the actual Python runtime Java code that is running. Magic number changes will be communicated in the release notes so that embedders or system administrators can delete old .pyc files when upgrading.
Note that if you use .pyc files, you must allow write-access to GraalPy at least when switching versions or modifying the original source code file. Otherwise, the regeneration of source code files will fail and every import will have the overhead of accessing each old .pyc file, parsing the code, serializing it, and trying (and failing) to write out a new .pyc file.
The directory structure created for .pyc files is as follows:
top_directory __pycache__ sourceA.graalpy.pyc sourceB.graalpy.pyc sourceA.py sourceB.py sub_directory __pycache__ sourceX.graalpy.pyc sourceX.py
By default, the __pycache__ directory is created on the same directory level as a source code file and in this directory all .pyc files from the same directory are stored. This directory may store .pyc files created with different versions of Python (including, for example, CPython), so the user may see files ending in .cpython3-6.pyc, for example.
.pyc files are largely managed automatically by GraalPy in a manner compatible to CPython. GraalPy provides options similar to CPython to specify the location of the .pyc files, and if they should be written at all, and both of these options can be changed by guest code.
The creation of .pyc files can be controlled in the same way as CPython (c.f. https://docs.python.org/3/using/cmdline.html):
- The GraalPy launcher (
graalpy) reads the
PYTHONDONTWRITEBYTECODEenvironment variable. If this is set to a non-empty string, Python will not try to write .pyc files when importing modules.
- The launcher command line option
-B, if given, has the same effect as the above.
- A guest language code can change the attribute
sysbuilt-in module at runtime to change the behavior for subsequent imports.
- The launcher reads the
PYTHONPYCACHEPREFIXenvironment variable. If set, the __pycache__ directory will be created at the path specified by the prefix, and a mirror of the directory structure of the source tree will be created on-demand to store the .pyc files.
- A guest language code can change the attribute
sysmodule at runtime to change the location for subsequent imports.
Since the embedder cannot use environment variables or CPython options to communicate these options to GraalPy, these options are made available as language options:
python.DontWriteBytecodeFlag- equivalent to
python.PyCachePrefix- equivalent to
Note that a Python context will not enable writing .pyc files by default.
graalpy launcher enables it by default, but if this is desired in the embedding use case, care should be taken to ensure that the __pycache__ location is properly managed and the files in that location are secured against manipulation in the same way as the source code files (.py) from which they were derived.
Note also that to upgrade the application sources to Oracle GraalPy, old .pyc files must be removed by the embedder as required.
Security Considerations #
All file operations (obtaining the data, timestamps, and writing .pyc files)
are achieved through the FileSystem API. Embedders can modify all of these operations by means of custom (for example, read-only)
The embedder can also effectively disable the creation of .pyc files by disabling I/O permissions for GraalPy.
If .pyc files are not readable, their location is not writable. If the .pyc files’ serialization data or magic numbers are corrupted in any way, the deserialization fails and GraalPy parses the .py source code file again. This comes with a minor performance hit only for the parsing of modules, which should not be significant for most applications (provided the application performs actual work in addition to loading Python code).