As a python developer or someone who interacts with Python for whatever reason, it is important to understand what a python interpreter is. To enable us to drive the point home safely, we will first examine how your own PC runs its programs. Then we can delve into how Python runs your code and explore how the concept of Python Interpreter comes up.
How does your computer run programs
Have you for once stopped and asked yourself how your computer manages to runs its own programs? If you do, then we are just about to answer all your queries. However, if you have not, then journey with us in this adventure to explore how programs run on your PC.
The two vital components of any personal computer is a processor that many people refer to as the central processing unit and the memory
Up to now, most people know the processor as the brain of the computer, and you are not far from the truth. However, in this adventure, we would like to give a new perspective to your understanding of both a processor and memory.
First, the memory is the component where code is stored in memory as instructions. Besides, it also stores data.
Instructions are dependent on the processor, referred to as the Instruction set Architecture in the technical domain. Thus the instruction set understood by an Intel-based processor is different from that of ARM.
The processor fetches instructions from memory and executes them. And that is what it keeps doing all the time. Machine code is the collection of instructions that are stored in the computer memory.
When you write computer code instructions using your editor, it is stored in your hard drive as source code. The computer does not understand them. They might be instructions to the processor, but we do not normally feed source code directly. Because your processor only understands machine code, which is usually 0’s and 1’s only.
Therefore, there has to be a translator to translate your source code from your specific programming language to machine code that the processor can understand. This translator is what we are calling a compiler. Its’ work is to translate the instructions you have written for a specific processor and operating system. A compiler’s input is the source code from a particular programming language, while the output is always binary. The binary is a set of 1 and 0’s representing the instruction set that the processor can comprehend.
Let us say you are on a Windows operating system. When you double click on a binary file, the operating system will load this binary file into the memory and then instruct the processor to start fetching and executing the instructions that are translated from the source code that you wrote. This is the flow sequence when working with programming languages such as C, C++, and Go. However, this is different when dealing with Python.
Consider the following scenario
You have a test.py file saved on your pc then you run the following command in the same directory that your file is
python test.py
The Python Interpreter Concept
The Python in the above command is the python interpreter, and it is already a binary. Python interpreter is the program that you are actually running.
This means that the machine code which is going to be stored in the memory is going to be the instructions that represent the python interpreter. Note the instructions that represent your test file program.
So, the python interpreter will be stored in memory as a bunch of instructions that the processor can understand.
You can look at the python interpreter as two components. That is the compiler and the python virtual machine.
Now, how does your computer eventually run your source code if the machine instructions in the memory are only the machine instructions that represent the python interpreter?
Well, if you look at the command that you have written, you are parsing in test.py, the name of the file, as an argument to the python interpreter.
So, the compiler will read the source code and does the duty of any other compiler – translate your source code into something intermediate that is not the machine code. The latter intermediate language is called a byte code. Note that the processor cannot understand a byte code.
Instead, it is an intermediate code that is not targeting a specific processor. The intended target is the python virtual machine. Therefore, the Python virtual machine is the specific component that can understand a byte code.
This concept should broadly cement your understanding, especially why we call it a python virtual machine in the first place. Because if you examine the python virtual machine, it’s essentially doing the same job as the hardware processor. The python virtual machine will read the byte code instructions of the byte code and execute it on the hardware.
So, this how the python interpreter works and how Python works under the hood. Note that it is not just Python, java also works this way.
It is critical to understand that Python is a programming language; thus, it is a specification of what the language looks like. However, the concept we are addressing is a specific implementation of the language that we call CPython.
It is the implementation that is widely used when you install Python from python.org that is the official installation.
Conclusively, we can confidently say that this is the way python works in your machine too.
Using examples to understand the concepts
Let’s use our previous example of hello world stored in a file called test.py.
Running the program simply involves writing the following in the command line
python3 test.py
As initially indicated, python3 is the python interpreter, and test.py is the file that represents our source code.
Hello World
Is printed out on the screen.
What if you just want to compile test.py and look at the byte code that is the result of the compilation stage.
To do so, you need to run the following command in the same directory that the .py is located, as shown
python3 -m py_compile test.py
What will happen is that the compiler of the python interpreter is going to read test.py and is going to compile test.py into bytecode, and this byte code is going to be stored in a folder called __pycache__
To list the files under __pycache__, enter the following command.
ls __pycache__
you get
test.cpython-38.pyc
The .pyc extension in Python or in CPython specifically means this is the bytecode
To check the contents, then enter the following command
cat pycache/test .cpython-38.pyc
And python virtual machine can understand this gibberish contents, some of which we cannot make out what it is because it is made of 0’s and 1’s. And subsequently, translate this into actual instructions that run on your processor.
Remember to use command cat in Mac or Linux. However, if you are using the PC, then use type. It is going to do exactly the same thing.
What if you want to look at a human-readable version of the respective byte code?
Because the byte code in its binary format, you cannot get something meaningful out of it. The python virtual machine can understand it. However, humans cannot.
To get a human-readable version of the given byte code, then enter the following command.
Python3 -m dis test.py
here,
dis means disassemble
After pressing enter, you get the following bunch of instructions
0 LOAD_NAME 0 (print)
2 LOAD_CONST 0 (‘hello world’)
4 CALL_FUNCTION 1
6 POP_TOP
8 LOAD_CONST 1 (None)
10 RETURN_VALUE
Summary
The compiler is the component in your python interpreter that reads the source code that your write and then translates the source code into a binary.
The byte code is also binary as well. But this binary is not 1’s and 0’s that your processor can understand.
So the processor cannot actually execute the instructions of the byte code but what can understand the byte code is the python virtual machine, which is another component in the python interpreter.
The python virtual machine is going to read the byte code, and it is going to be the component that executes this byte code on your hardware.