This is a list of projects that are interesting for potential contributors who are seriously interested in the PyPy project. They mostly share common patterns - they’re mid-to-large in size, they’re usually well defined as a standalone projects and they’re not being actively worked on. For small projects that you might want to work on, it’s much better to either look at the issue tracker, pop up on #pypy on irc.freenode.net or write to the mailing list. This is simply for the reason that small possible projects tend to change very rapidly.
This list is mostly for having on overview on potential projects. This list is by definition not exhaustive and we’re pleased if people come up with their own improvement ideas. In any case, if you feel like working on some of those projects, or anything else in PyPy, pop up on IRC or write to us on the mailing list.
PyPy’s bytearray type is very inefficient. It would be an interesting task to look into possible optimizations on this.
The idea is to have a special implementation of list objects which is used when doing myslice = mylist[a:b]: the new list is not constructed immediately, but only when (and if) myslice or mylist are mutated.
The numpy is rapidly progressing in pypy, so feel free to come to IRC and ask for proposed topic. A not necesarilly up-to-date list of topics is also available.
Analyzing performance of applications is always tricky. We have various tools, for example a jitviewer that help us analyze performance.
The jitviewer shows the code generated by the PyPy JIT in a hierarchical way, as shown by the screenshot below:
- at the bottom level, it shows the Python source code of the compiled loops
- for each source code line, it shows the corresponding Python bytecode
- for each opcode, it shows the corresponding jit operations, which are the ones actually sent to the backend for compiling (such as i15 = i10 < 2000 in the example)
The jitviewer is a web application based on flask and jinja2 (and jQuery on the client): if you have great web developing skills and want to help PyPy, this is an ideal task to get started, because it does not require any deep knowledge of the internals.
CPython 3.3 will use an optimized unicode representation which switches between different ways to represent a unicode string, depending on whether the string fits into ASCII, has only two-byte characters or needs four-byte characters.
The actual details would be rather differen in PyPy, but we would like to have the same optimization implemented.
PyPy has pluggable garbage collection policy. This means that various garbage collectors can be written for specialized purposes, or even various experiments can be done for the general purpose. Examples:
This is work in progress. Besides the main development path, whose goal is to make a (relatively fast) version of pypy which includes STM, there are independent topics that can already be experimented with on the existing, JIT-less pypy-stm version:
We’re usually happy to introduce new benchmarks. Please consult us before, but in general something that’s real-world python code and is not already represented is welcome. We need at least a standalone script that can run without parameters. Example ideas (benchmarks need to be got from them!):
We already tried working with LLVM and at the time, LLVM was not mature enough for our needs. It’s possible that this has changed, reviving the LLVM backend (or writing new from scratch) for static compilation would be a good project.
(On the other hand, just generating C code and using clang might be enough. The issue with that is the so-called “asmgcc GC root finder”, which has tons of issues of this own. In my opinion (arigo), it would be definitely a better project to try to optimize the alternative, the “shadowstack” GC root finder, which is nicely portable. So far it gives a pypy that is around 7% slower.)
Note: there is a basic proof-of-concept for that as a uwsgi pypy plugin
Being able to embed PyPy, say with its own limited C API, would be useful. But here is the most interesting variant, straight from EuroPython live discussion :-) We can have a generic “libpypy.so” that can be used as a placeholder dynamic library, and when it gets loaded, it runs a .py module that installs (via ctypes) the interface it wants exported. This would give us a one-size-fits-all generic .so file to be imported by any application that wants to load .so files :-)
A lot of work has gone into PyPy’s implementation of CPython’s C-API over the last years to let it reach a practical level of compatibility, so that C extensions for CPython work on PyPy without major rewrites. However, there are still many edges and corner cases where it misbehaves, and it has not received any substantial optimisation so far.
The objective of this project is to fix bugs in cpyext and to optimise several performance critical parts of it, such as the reference counting support and other heavily used C-API functions. The net result would be to have CPython extensions run much faster on PyPy than they currently do, or to make them work at all if they currently don’t. A part of this work would be to get cpyext into a shape where it supports running Cython generated extensions.