Fig 1. A chap performing the Detroit JIT |
Unless you have been living under a rock, or are from the past (in which case, welcome), you will be aware that a JIT is coming to PHP 8: The vote ended, quietly, today, with a vast majority in favour of merging into PHP 8, so, it’s official.
Throw some crazy shapes in celebration, suggestion given in Fig 1, and it’s even called “The (Detroit) JIT” …
Now sit down and read the following myth busting article, we’re going to clear up some confusion around what the JIT is, what it will benefit, and delve into how it works (but only a little, because I don’t want you to be bored).
Since I don’t know who I’m talking to, I’m going to start at the beginning with the simple questions and work up to the complex ones, if you already are sure you know the answer to the question in a heading, you can skip that part …
What is JIT ?
PHP implements a virtual machine, a kind of virtual processor – we call it Zend VM. PHP compiles your human readable script into instructions that the virtual machine understands (we call them opcodes), this stage of execution is what we refer to as “Compile Time”. At the “Runtime” stage of execution the virtual machine (Zend VM) executes your code’s instructions (opcodes).
This all works very well, and tools like APC (in the past) and OPCache (today) cache your code’s instructions (opcodes) so that “Compile Time” only happens when it must.
First, one line to explain what JIT is in general: Just-in-time is a compiler strategy that takes an intermediate representation of code and turns it into architecture dependent machine code at runtime – just-in-time for execution.
In PHP, this means that the JIT treats the instructions generated for the Zend VM as the intermediate representation and emits architecture dependent machine code, so that the host of your code is no longer the Zend VM, but your CPU directly.
Why does PHP need a JIT ?
The focus of the PHP internals community since slightly before PHP 7.0 has been performance, brought about by healthy competition from Facebook’s HHVM project. The majority of the core changes in PHP 7.0 were contained in the PHPNG patch, which improved significantly the way in which PHP utilizes memory and CPU at its core, since then every one of us has been forced to keep one eye on performance.
Since PHP 7.0 some performance improvements have been made, optimizations for the HashTable (a core data structure for PHP), specializations in the Zend VM for certain opcodes, specializations in the compiler for certain sequences, and a constant stream of improvements to the Optimizer component of OPCache … and many others besides, too boring to list.
It’s a brute fact that these optimizations can only take us so far and we are rapidly approaching, or maybe have already met, a brick wall in our ability to improve it any further.
Caveat: When we say things like “we can’t improve it any further”, what we really mean is, “the trade-offs we would have to make to improve it any further no longer look appealing” … whenever we talk about performance optimizations, we’re talking about trade-offs. Often, trade-offs in simplicity for performance. We would all like to think that the simplest code is the fastest code, but that simply is not the case in the modern world of C programming. The fastest code is often that code which is prepared to take advantage of architecture dependent intrinsics or platform (compiler) dependent builtins. Simplicity just is not a guarantee of the best performance …
At this time, the ability for PHP to JIT would appear to be the best way to squeeze more performance from PHP.
Will the JIT make my website faster ?
In all probability, not significantly.
Maybe not the answer you were expecting: In the general case, applications written in PHP are I/O bound, and JIT works best on CPU bound code.
What on earth does “I/O and CPU bound” mean ?
When we want to describe the general performance characteristics of a piece of code, or an application, we use the terms I/O bound and CPU bound.
In the simplest possible terms:
- An I/O bound piece of code would go faster if we could improve (reduce, optimize) the I/O it is doing.
- A CPU bound piece of code would go faster if we could improve (reduce, optimize) the instructions the CPU is executing – or (magically) increase the clock speed of the CPU 🙂
A piece of code, or an application, may be I/O bound, CPU bound, or bound equally to CPU and I/O.
In general, PHP applications tend to be I/O bound – the thing that is slowing them down is the I/O which they are performing – connecting, reading, and writing to databases, caches, files, sockets and so on.
What does CPU bound PHP look like ?
CPU bound code is not something a lot of PHP programmers will be familiar with, because of the nature of most PHP applications – their job tends to be connect to some database, and or possibly a cache, do some light lifting and spit out an html/json/xml response.
You may look around your codebase and find lots of code that has nothing whatever to do with I/O, code that is calling functions completely disconnected from I/O even, and be confused that I seem to be implying that this doesn’t make your application CPU bound, even though there may be many more lines of code that deal with non I/O than I/O.
PHP is actually quite fast, it’s one of the fastest interpreted languages in the world. There is no remarkable difference between the Zend VM calling a function that has nothing to do with I/O, and making the same call in machine code. There is clearly a difference, but the fact is that machine code has a calling convention, and Zend VM has a calling convention, machine code has a prologue and Zend VM has a prologue: Whether you call some_c_level_function() in Zend Opcodes or machine code doesn’t make a significant difference to the performance of the application making the call – although it may seem to make a significant difference to that call.
Note: A calling convention is (roughly) a sequence of instructions executed *before* entering into another function, a prologue is a sequence of instructions executed *at entry* into another function: The calling convention in both cases pushes arguments onto the stack, and the prologue pops them off the stack.
What about loops, and tail calls and X I hear you ask: PHP is actually quite smart, with the Optimizer component of OPCache enabled your code is transformed as if by magic into the most efficient form you could have written.
It’s important to note now that JIT doesn’t change the calling convention of Zend Functions from the convention established by the VM – Zend must be able to switch between JIT and VM modes at any time and so the decision was taken to retain the calling convention established by the VM. As a result those calls that you see everywhere aren’t remarkably faster when JIT’d.
If you want to see what CPU bound PHP code looks like, look in
Zend/bench.php … This is obviously an extreme example of CPU bound code, but it should drive home the point that where the JIT really shines is in the area of mathematics.
Did PHP make the ultimate trade-off to make math faster ?
No. We did it to widen the scope of PHP, and considerably so.
Without wanting to toot our own horn, we have the web covered – If you are a web programmer in 2019 and you haven’t considered using PHP for your next project, then you are doing the web wrong – in this very biased PHP developer’s opinion.
To improve the ability to execute math faster in PHP seems, at a glance, to be a very narrow scope.
However, this in fact opens the door on things such as machine learning, 3d rendering, 2d (gui) rendering, and data analysis, to name just a few.
Why can’t we have this in PHP 7.4 ?
I just called the JIT “the ultimate trade-off”, and I think it is: It’s arguably one of the most complex compiler strategies ever invented, maybe the most complex. To introduce a JIT is to introduce considerable complexity.
If you ask Dmitry (the author of the JIT) if he made PHP complex, he would say “No, I hate complexity” (that’s a direct quote).
At bottom, complex is anything we do not understand, and at the moment, there are very few internals developers (less than a handful) that truly understand the implementation of JIT that we have.
PHP 7.4 is coming up fast, merging into PHP 7.4 would leave us with a version of PHP that less than a handful of people could debug, fix, or improve (in any real sense). This is just not an acceptable situation for those people that voted no on merging into PHP 7.4.
In the time between now and PHP 8, many of us will be working in our spare time to understand the JIT: We still have features we want to implement and tools we need to rewrite for PHP 8, and first we must understand the JIT. We need this time, and are very grateful that a majority of voters saw fit to give it to us.
Complex is not synonymous with horrible: Complex can be beautiful, like a nebula, and the JIT is that kind of complex. You can, in principle, fully understand something complex and only make a marginal reduction in the apparent complexity of that thing. In other words, even when there are 20 internals developers who are as familiar with the JIT as Dmitry is, it doesn’t really change the complex nature of JIT.
Will development of PHP slow down ?
There’s no reason to think that it will. We have enough time that we can say with confidence that by the time PHP 8 is generally available, there will be enough of us familiar with the JIT to function at least as well as we do today when it comes to fixing bugs and pushing PHP forward.
When trying to square this with the view that a JIT is inherently complex, consider that the majority of our time spent on introducing new features is actually spent discussing that feature. For the majority of features, and even fixes, code may take in the order of minutes or hours to write, and discussions take in the order of weeks or months. In rare cases, the code for a feature may take in the order of hours or days to write, but the discussion will always take longer in those rare cases.
That’s all I have to say about that …
Enjoy your weekend.