FeaturedIT topics

What is PyPy? Faster Python without pain

Python has earned a reputation for being powerful, flexible, and easy to work with. These virtues have led to its use in a huge and growing variety of applications, workflows, and fields. But the design of the language—its interpreted nature, its runtime dynamism—means that Python has always been an order of magnitude slower than machine-native languages like C or C++.

Over the years, developers have come up with a variety of workarounds for Python’s speed limitations. For instance, you could write performance-intensive tasks in C and wrap it with Python; many machine learning libraries do exactly this. Or you could use Cython, a project that lets you sprinkle Python code with runtime type information that allows it to be compiled to C.

But workarounds are never ideal. Wouldn’t it be great if we could just take an existing Python program as is, and run it dramatically faster? That’s exactly what PyPy allows you to do.

PyPy vs. CPython

PyPy is a drop-in replacement for the stock Python interpreter, CPython. Whereas CPython compiles Python to intermediate bytecode that is then interpreted by a virtual machine, PyPy uses just-in-time (JIT) compilation to translate Python code into machine-native assembly language.

Depending on the task being performed, the performance gains can be dramatic. On the average, PyPy speeds up Python by about 7.6 times, with some tasks accelerated 50 times or more. The CPython interpreter simply doesn’t perform the same kinds of optimizations as PyPy, and probably never will, since that is not one of its design goals.

The best part is that little to no effort is required on the part of the developer to unlock the gains PyPy provides. Simply swap out CPython for PyPy, and for the most part you’re done. There are a few exceptions, discussed below, but PyPy’s stated goal is to run existing, unmodified Python code and provide it with an automatic speed boost.

PyPy currently supports both Python 2 and Python 3, by way of different incarnations of the project. In other words, you need to download different versions of PyPy depending on the version of Python you will be running. The Python 2 branch of PyPy has been around much longer, but the Python 3 version has been brought up to speed as of late. It currently supports both Python 3.5 (production quality) and Python 3.6 (beta quality).

In addition to supporting all of the core Python language, PyPy works with the vast majority of the tools in the Python ecosystem, such as pip for packaging or virtualenv for virtual environments. Most Python packages, even those with C modules, should work as-is, although there are limitations we’ll go into below.

How PyPy works

PyPy uses optimization techniques found in other just-in-time compilers for dynamic languages. It analyzes running Python programs to determine the type information of objects as they’re created and used in programs, then uses that type information as a guide to speed things up. For instance, if a Python function works with only one or two different object types, PyPy generates machine code to handle those specific cases.

PyPy’s optimizations are handled automatically at runtime, so you generally don’t need to tweak its performance. An advanced user might experiment with PyPy’s command-line options to generate faster code for special cases, but only rarely is this necessary.

PyPy also departs from the way CPython handles some internal functions, but tries to preserve compatible behaviors. For instance, PyPy handles garbage collection differently than CPython. Not all objects are immediately collected once they go out of scope, so a Python program running under PyPy may show a larger memory footprint than when running under CPython. But you can still use Python’s high-level garbage collection controls exposed through the gc module, such as gc.enable(), gc.disable(), and gc.collect().

If you want information about PyPy’s JIT behavior at runtime, PyPy includes a module, pypyjit, that exposes many JIT hooks to your Python application. If you have a function or module that seems to be performing poorly with the JIT, pypyjit allows you to obtain detailed statistics about it.

Another PyPy-specific module, __pypy__, exposes other features specific to PyPy, so can be useful for writing apps that leverage those features. Because of Python’s runtime dynamism, it is possible to construct Python apps that use these features when PyPy is present and ignores them when it is not.

PyPy limitations

Magical as PyPy might seem, it isn’t magic. PyPy has certain limitations that reduce or obviate its effectiveness for certain kinds of programs. Alas, PyPy is not a completely universal replacement for the stock CPython runtime.

PyPy works best with pure Python apps

PyPy has always performed best with “pure” Python applications — i.e., applications written in Python and nothing else. Python packages that interface with C libraries, such as NumPy, have not fared as well due to the way PyPy emulates CPython’s native binary interfaces. 

PyPy’s developers have whittled away at this issue, and made PyPy more compatible with the majority of Python packages that depend on C extensions. Numpy, for instance, works very well with PyPy now. But if you want maximum compatibility with C extensions, use CPython.

PyPy works best with longer-running programs

One of the side effects of how PyPy optimizes Python programs is that longer-running programs benefit most from its optimizations. The longer the program runs, the more run-time type information PyPy can gather, and the more optimizations it can make. One-and-done Python scripts won’t benefit from this sort of thing. The applications that do benefit typically have loops that run for long periods of time, or run continuously in the background—web frameworks, for instance.

PyPy doesn’t do ahead-of-time compilation

PyPy compiles Python code, but it isn’t a compiler for Python code. Because of the way PyPy performs its optimizations and the inherent dynamism of Python, there’s no way to emit the resulting JITted code as a standalone binary and re-use it. Each program has to be compiled for each run. If you want to compile Python into faster code that can run as a standalone app, use Cython, Numba, or the currently experimental Nuitka project.

Related Articles

Back to top button