Exploring Python Byte Code - Disassembler

Discover Python Byte Code and how the Python Virtual Machine (PVM) interprets it. Learn to use the dis module to examine byte code and optimize your Python programs. This article offers a comprehensive introduction to byte code, with practical examples and insights into Python's execution model.

Exploring Python Byte Code - Disassembler
Exploring the Core of Python: From Code to Byte Code - Discover the journey of Python programming from its readable form into the underlying byte code in Part 2 of our series.

Introduction

Welcome back to our deep dive into Python internals. Our previous article explored the differences between compilers and interpreters and how Python uses a hybrid approach. Today, we'll focus on Python Byte Code, a critical intermediate representation of your code, and the role of the Python Virtual Machine (PVM) in interpreting it.

We'll discuss byte code and how it's generated and provide a quick overview of how the PVM interprets it. By the end, you'll understand Python's execution model and be ready to explore the disassembler module in detail.


What is Python Byte Code?

Before we explore the intricacies of Python byte code, let's first understand what byte code is and why it matters. Python is designed to read and write quickly, like many high-level programming languages. However, computers operate at a much lower level, understanding only machine code—a series of binary instructions.

When we write a Python script, we create human-readable high-level code. But for our computer to execute this script, it needs to translate it into a form it can understand.

This is where Python byte code comes into play. Byte code is an intermediate representation of our Python code. It's a low-level set of instructions that is more abstract than machine code but closer to the computer's understanding than the original Python script.

When we run a Python program, Python parses the program, converts it to Byte Code, and starts the Python Virtual Machine to execute the Byte Code.


How Python Converts Code to Byte Code

The following is a simplified view of the process:

  1. We write our Python script in a .py file.
  2. We execute python ourprogram.py
  3. Parsing: Python reads the program and parses it into an Abstract Syntax Tree.
  4. Compilation: Python converts the AST into byte code. This byte code is stored in .pyc files in the __pycache__ folder.
  5. Execution: The Python starts the Python Virtual Machine (PVM), which reads and executes the byte code.

Think of this process as translating a book from English to a simplified language like Pig Latin before making a final translation to Spanish.

This intermediate language, byte code, helps bridge the gap between human- and machine-readable code.

The Role of __pycache__

When Python compiles our script into byte code, it stores these byte code files in a folder called __pycache__. This directory is typically hidden and located in the same directory as our Python script.

The .pyc files within __pycache__ contain the compiled byte code and are named in a way that includes the version of Python used for the compilation. For example, if we have a script named example.py and we're using Python 3.8, the compiled byte code file might be named example.cpython-38.pyc.

Python checks if it can reuse the existing .pyc files in __pycache__ by comparing the metadata of the source file and the compiled file. Specifically, it looks at two things:

  1. Modification Time: Python compares the last modification time of the source .py file with the metadata in the .pyc file. If the source file has been modified more recently than the .pyc file, Python will regenerate the byte code.
  2. Python Version: The .pyc file includes information about the Python version used for compilation. If there's a mismatch between the version used to create the .pyc file, and the current interpreter, Python, will regenerate the byte code.

By checking these factors, Python ensures that it executes the most up-to-date version of our code without unnecessary recompilation. This mechanism helps speed up program startup time by reusing compiled byte code when possible.

🧠
When working with Python projects and using GIT for version control, it’s essential to ignore .pyc files and the __pycache__ directories. Python automatically generates these files every time we change the source code. Including them can lead to unnecessary clutter and potential merge conflicts.

Why Byte Code and the PVM?

We might wonder why we need this intermediate step and the PVM. The byte code and PVM serve several crucial purposes:

  • Portability: Byte code can be run on any machine with a Python interpreter, making our Python programs cross-platform. Without the PVM, the Python team would need to maintain separate versions of Python for each operating system, each interpreting the Abstract Syntax Tree (AST) directly. This would significantly increase the complexity and maintenance overhead.
  • Efficiency: It's faster to execute byte code than to interpret source code directly, as some high-level parsing and analysis are already done. The .pyc files help with this speed advantage. When we run a Python script, Python checks if the corresponding .pyc file is available and up-to-date. If so, it can skip the compilation step and directly execute the byte code, saving time. If the source code has changed since the last compilation, Python will regenerate the .pyc file to ensure it reflects the latest changes.
  • Optimization: Python performs various optimizations at AST and byte code levels. For example, constant folding (evaluating constant expressions at compile time) happens at the AST level. Byte code optimizations include peephole optimizations, such as eliminating unnecessary operations. These optimizations improve the performance of our Python code without requiring changes to the source code.
  • Simplified Maintenance: By using byte code and a virtual machine, the Python development team can maintain a single implementation of Python's execution environment (the PVM) that works across all supported operating systems. This approach significantly reduces the complexity of language maintenance and allows for more consistent behavior across different platforms.
  • Parallel Development: Using byte code allows the language and the VM to evolve independently. For example, new features can be added to the Python language (such as new syntax or standard library modules) without necessarily requiring changes to the PVM. This separation of concerns makes it easier to develop and maintain Python.

Byte code and the PVM are critical components of Python's efficiency, portability, and maintainability. By converting our script into byte code and executing it through the PVM, Python ensures that our programs can run smoothly across different systems with minimal performance overhead while reducing the overhead of maintaining multiple language interpreters for each supported platform.


A banner featuring abstract code fragments transforming into byte code with a focus on the disassembler process. It adds a dynamic visual element to enhance the reader's experience.
Illustration of Python byte code transformation and the disassembler process with a tech-themed background of gears and circuits.

Introducing the dis Module

The dis module in Python is a powerful tool for disassembling Python byte code into a human-readable format. This module is included in the Python Standard Library, so you do not need to install it separately. It helps you understand how Python translates your code into byte code instructions, which can be particularly useful for debugging and optimizing our programs.

To use the dis module, you first need to import it into our script and call it in any function or block of code we would like to disassembly:

Python 3 Python Byte Code

import dis

def greet(name):
  return f"Hello, {name}!"

dis.dis(greet)          
        

The output of the dis module shows the byte code instructions for the function:

  • LOAD_FAST: Loads a local variable.
  • LOAD_CONST: Loads a constant.
  • BINARY_ADD: Adds two objects.
  • RETURN_VALUE: Returns a value from the function.

Conclusion

Understanding Python byte code and how the Python Virtual Machine (PVM) interprets it provides valuable insights into Python's execution model. The dis module is an essential tool that allows us to disassemble Python byte code, revealing the low-level instructions that the PVM executes. This knowledge can help us optimize our code, debug more effectively, and deepen our understanding of Python internals.

This article covered Python byte code, the PVM's role, and how to use the dis module to examine byte code. We can write more efficient and effective Python programs by leveraging these tools.

Stay tuned for the next part of this series, where we will delve deeper into interpreting disassembled code and discuss how some common Python constructs translate into Byte Code.


Addendum: A Special Note for Our Readers

I decided to delay the introduction of subscriptions. You can read the full story here.

In the meantime, I'll be accepting donations. I would appreciate it if you consider a donation, provided you can afford it:

Every donation helps me offset the site's running costs and an unexpected tax bill. Any amount is greatly appreciated.

Also, if you are looking for a fancy way to receive 2025, I am renting my timeshare in Phuket to cover part of the unexpected costs.

Anantara Vacation Club Phuket Mai Khao $2000.00/night
Phuket, Thailand / Posting R1185060

Or share our articles and links page in your social media posts.

Finally, please visit the TuringTacoTales Store on Redbubble to buy some Swag.

Take a look. Maybe you can find something you like: