A JIT-aware profiler ==================== Goal: have a profiler which is aware of the PyPy JIT and which shows which percentage of the time have been spent in which loops. Long term goal: integrate the data collected by the profiler with the jitviewer. The idea is record an event in the PYPYLOG everytime we enter and exit a loop or a bridge. Expected output ---------------- [100] {jit-profile-enter loop1 # e.g. an entry bridge [101] jit-profile-enter} ... [200] {jit-profile-enter loop0 # JUMP from loop1 to loop0 [201] jit-profile-enter} ... [500] {jit-profile-exit loop0 # e.g. because of a failing guard [501] jit-profile-exit} In this example, the exiting from loop1 is implicit because we are entering loop0. So, we spent 200-100=100 ticks in the entry bridge, and 500-200=300 ticks in the actual loop. What to do about "inner" bridges? ---------------------------------- "Inner bridges" are those bridges which jump back to the loop where they originate from. There are two possible ways of dealing with them: 1. we ignore them: we record when we enter the loop, but not when we jump to a compiled inner bridge. The exit event will be recorded only in case of a non-compiled guard failure or a JUMP to another loop 2. we record the enter/exit of each inner bridge The disadvantage of solution (2) is that there are certain loops which takes bridges at everty single iteration. So, in this case we would record a huge number of events, possibly adding a lot of overhead and thus making the profiled data useless. Detecting the enter to/exit from a loop ---------------------------------------- Ways to enter: - just after the tracing/compilation - from the interpreter, if the loop has already been compiled - from another loop, via a JUMP operation - from a hot guard failure (which we ignore, in case we choose solution (1) above) - XXX: am I missing anything? Ways to exit: - guard failure (entering blackhole) - guard failure (jumping to a bridge) (ignored in case of solution (1)) - jump to another loop - XXX: am I missing anything? About call_assembler: I think that at the beginning, we should just ignore call_assembler: the time spent inside the call will be accounted to the loop calling it.