PyPy is both:
- a reimplementation of Python in Python, and
- a framework for implementing interpreters and virtual machines for programming languages, especially dynamic languages.
PyPy tries to find new answers about ease of creation, flexibility, maintainability and speed trade-offs for language implementations. For further details see our goal and architecture document .
The mostly likely stumbling block for any given project is support for extension modules. PyPy supports a continually growing number of extension modules, but so far mostly only those found in the standard library.
The language features (including builtin types and functions) are very complete and well tested, so if your project does not use many extension modules there is a good chance that it will work with PyPy.
We list the differences we know about in cpython differences.
A module installed for CPython is not automatically available for PyPy — just like a module installed for CPython 2.6 is not automatically available for CPython 2.7 if you installed both. In other words, you need to install the module xyz specifically for PyPy.
On Linux, this means that you cannot use apt-get or some similar package manager: these tools are only meant for the version of CPython provided by the same package manager. So forget about them for now and read on.
It is quite common nowadays that xyz is available on PyPI and installable with pip install xyz. The simplest solution is to use virtualenv (as documented here). Then enter (activate) the virtualenv and type: pip install xyz.
If you get errors from the C compiler, the module is a CPython C Extension module using unsupported features. See below.
Alternatively, if either the module xyz is not available on PyPI or you don’t want to use virtualenv, then download the source code of xyz, decompress the zip/tarball, and run the standard command: pypy setup.py install. (Note: pypy here instead of python.) As usual you may need to run the command with sudo for a global installation. The other commands of setup.py are available too, like build.
We have experimental support for CPython extension modules, so they run with minor changes. This has been a part of PyPy since the 1.4 release, but support is still in beta phase. CPython extension modules in PyPy are often much slower than in CPython due to the need to emulate refcounting. It is often faster to take out your CPython extension and replace it with a pure python version that the JIT can see. If trying to install module xyz, and the module has both a C and a Python version of the same code, try first to disable the C version; this is usually easily done by changing some line in setup.py.
We fully support ctypes-based extensions. But for best performance, we recommend that you use the cffi module to interface with C code.
For information on which third party extensions work (or do not work) with PyPy see the compatibility wiki.
PyPy is regularly and extensively tested on Linux machines. It mostly works on Mac and Windows: it is tested there, but most of us are running Linux so fixes may depend on 3rd-party contributions. PyPy’s JIT works on x86 (32-bit or 64-bit) and on ARM (ARMv6 or ARMv7). Support for POWER (64-bit) is stalled at the moment.
To bootstrap from sources, PyPy can use either CPython (2.6 or 2.7) or another (e.g. older) PyPy. Cross-translation is not really supported: e.g. to build a 32-bit PyPy, you need to have a 32-bit environment. Cross-translation is only explicitly supported between a 32-bit Intel Linux and ARM Linux (see here).
PyPy currently aims to be fully compatible with Python 2.7. That means that it contains the standard library of Python 2.7 and that it supports 2.7 features (such as set comprehensions).
Yes, PyPy has a GIL. Removing the GIL is very hard. The problems are essentially the same as with CPython (including the fact that our garbage collectors are not thread-safe so far). Fixing it is possible, as shown by Jython and IronPython, but difficult. It would require adapting the whole source code of PyPy, including subtle decisions about whether some effects are ok or not for the user (i.e. the Python programmer).
Instead, since 2012, there is work going on on a still very experimental Software Transactional Memory (STM) version of PyPy. This should give an alternative PyPy which internally has no GIL, while at the same time continuing to give the Python programmer the complete illusion of having one. It would in fact push forward more GIL-ish behavior, like declaring that some sections of the code should run without releasing the GIL in the middle (these are called atomic sections in STM).
This really depends on your code. For pure Python algorithmic code, it is very fast. For more typical Python programs we generally are 3 times the speed of CPython 2.7. You might be interested in our benchmarking site and our jit documentation.
Note that the JIT has a very high warm-up cost, meaning that the programs are slow at the beginning. If you want to compare the timings with CPython, even relatively simple programs need to run at least one second, preferrably at least a few seconds. Large, complicated programs need even more time to warm-up the JIT.
No, we found no way of doing that. The JIT generates machine code containing a large number of constant addresses — constant at the time the machine code is written. The vast majority is probably not at all constants that you find in the executable, with a nice link name. E.g. the addresses of Python classes are used all the time, but Python classes don’t come statically from the executable; they are created anew every time you restart your program. This makes saving and reloading machine code completely impossible without some very advanced way of mapping addresses in the old (now-dead) process to addresses in the new process, including checking that all the previous assumptions about the (now-dead) object are still true about the new object.
Yes. The toolsuite that translates the PyPy interpreter is quite general and can be used to create optimized versions of interpreters for any language, not just Python. Of course, these interpreters can make use of the same features that PyPy brings to Python: translation to various languages, stackless features, garbage collection, implementation of various things like arbitrarily long integers, etc.
Certainly you can come to sprints! We always welcome newcomers and try to help them as much as possible to get started with the project. We provide tutorials and pair them with experienced PyPy developers. Newcomers should have some Python experience and read some of the PyPy documentation before coming to a sprint.
Coming to a sprint is usually the best way to get into PyPy development. If you get stuck or need advice, contact us. IRC is the most immediate way to get feedback (at least during some parts of the day; most PyPy developers are in Europe) and the mailing list is better for long discussions.
On Linux, if SELinux is enabled, you may get errors along the lines of “OSError: externmod.so: cannot restore segment prot after reloc: Permission denied.” This is caused by a slight abuse of the C compiler during configuration, and can be disabled by running the following command with root privileges:
# setenforce 0
This will disable SELinux’s protection and allow PyPy to configure correctly. Be sure to enable it again if you need it!
No, RPython is not a Python compiler.
In Python, it is mostly impossible to prove anything about the types that a program will manipulate by doing a static analysis. It should be clear if you are familiar with Python, but if in doubt see [BRETT].
If you want a fast Python program, please use the PyPy JIT instead.
|[BRETT]||Brett Cannon, Localized Type Inference of Atomic Types in Python, http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.3231|
RPython is a restricted subset of the Python language. It is used for implementing dynamic language interpreters within the PyPy toolchain. The restrictions ensure that type inference (and so, ultimately, translation to other languages) of RPython programs is possible.
The property of “being RPython” always applies to a full program, not to single functions or modules (the translation toolchain does a full program analysis). The translation toolchain follows all calls recursively and discovers what belongs to the program and what does not.
RPython program restrictions mostly limit the ability to mix types in arbitrary ways. RPython does not allow the binding of two different types in the same variable. In this respect (and in some others) it feels a bit like Java. Other features not allowed in RPython are the use of special methods (__xxx__) except __init__ and __del__, and the use of reflection capabilities (e.g. __dict__).
You cannot use most existing standard library modules from RPython. The exceptions are some functions in os, math and time that have native support. We have our own “RPython standard library” in rpython.rlib.*.
To read more about the RPython limitations read the RPython description.
No. Zope’s RestrictedPython aims to provide a sandboxed execution environment for CPython. PyPy’s RPython is the implementation language for dynamic language interpreters. However, PyPy also provides a robust sandboxed Python Interpreter.
If you put “NOT_RPYTHON” into the docstring of a function and that function is found while trying to translate an RPython program, the translation process stops and reports this as an error. You can therefore mark functions as “NOT_RPYTHON” to make sure that they are never analyzed.
It’s not necessarily nonsense, but it’s not really The PyPy Way. It’s pretty hard, without some kind of type inference, to translate this Python:
a + b
into anything significantly more efficient than this Common Lisp:
(py:add a b)
And making type inference possible is what RPython is all about.
You could make #'py:add a generic function and see if a given CLOS implementation is fast enough to give a useful speed (but I think the coercion rules would probably drive you insane first). – mwh
No, and you shouldn’t try. First and foremost, RPython is a language designed for writing interpreters. It is a restricted subset of Python. If your program is not an interpreter but tries to do “real things”, like use any part of the standard Python library or any 3rd-party library, then it is not RPython to start with. You should only look at RPython if you try to write your own interpreter.
If your goal is to speed up Python code, then look at the regular PyPy, which is a full and compliant Python 2.7 interpreter (which happens to be written in RPython). Not only is it not necessary for you to rewrite your code in RPython, it might not give you any speed improvements even if you manage to.
Yes, it is possible with enough effort to compile small self-contained pieces of RPython code doing a few performance-sensitive things. But this case is not interesting for us. If you needed to rewrite the code in RPython, you could as well have rewritten it in C or C++ or Java for example. These are much more supported, much more documented languages :-)
The above paragraphs are not the whole truth. It is true that there are cases where writing a program as RPython gives you substantially better speed than running it on top of PyPy. However, the attitude of the core group of people behind PyPy is to answer: “then report it as a performance bug against PyPy!”.
Here is a more diluted way to put it. The “No, don’t!” above is a general warning we give to new people. They are likely to need a lot of help from some source, because RPython is not so simple nor extensively documented; but at the same time, we, the pypy core group of people, are not willing to invest time in supporting 3rd-party projects that do very different things than interpreters for dynamic languages — just because we have other interests and there are only so many hours a day. So as a summary I believe it is only fair to attempt to point newcomers at existing alternatives, which are more mainstream and where they will get help from many people.
If anybody seriously wants to promote RPython anyway, they are welcome to: we won’t actively resist such a plan. There are a lot of things that could be done to make RPython a better Java-ish language for example, starting with supporting non-GIL-based multithreading, but we don’t implement them because they have little relevance to us. This is open source, which means that anybody is free to promote and develop anything; but it also means that you must let us choose not to go into that direction ourselves.
In theory yes. But we tried to use it 5 or 6 times already, as a translation backend or as a JIT backend — and failed each time.
In more details: using LLVM as a (static) translation backend is pointless nowadays because you can generate C code and compile it with clang. (Note that compiling PyPy with clang gives a result that is not faster than compiling it with gcc.) We might in theory get extra benefits from LLVM’s GC integration, but this requires more work on the LLVM side before it would be remotely useful. Anyway, it could be interfaced via a custom primitive in the C code.
On the other hand, using LLVM as our JIT backend looks interesting as well — but again we made an attempt, and it failed: LLVM has no way to patch the generated machine code.
So the position of the core PyPy developers is that if anyone wants to make an N+1’th attempt with LLVM, they are welcome, and will be happy to provide help in the IRC channel, but they are left with the burden of proof that (a) it works and (b) it gives important benefits.
No, you have to rebuild the entire interpreter. This means two things:
In this context it is not that important to be able to translate RPython modules independently of translating the complete interpreter. (It could be done given enough efforts, but it’s a really serious undertaking. Consider it as quite unlikely for now.)
Because it’s fun.