Since its inception the slow running speed of PHP has been widely publicised and over the years there have been a number of improvements. The first Zend Engine arrived with PHP4 and delivered various performance enhancements (among other features). Each release since this time has delivered some sort of increased efficiency in one way or another.
It has become more interesting recently however with three projects looking for improvements in different ways. The core has adopted the Zend OPcache for future versions of PHP, Facebook has been working on a just in time compiler called HipHop VM and the team that brought us Phalcon framework have created Zephir.
All of these projects have chosen to tackle the issue of PHP’s speed via different avenues. It has therefore left one simple question - who’s making the biggest improvements? Who’s the fastest?
With this question in my mind I decided to do something quite ridiculous and write a simple benchmarking setup to test the various ways these projects can be employed. Yes, there is one outright winner in this particular benchmark, but it is important not to get hung up on that.
Like I mentioned all of these techniques are different and therefore they are likely to be a better fit for differing situations. Although winning in terms of outright speed in this particular test it may not work for you from another perspective. Each carries with it certain side effects or caveats that you’ll need to take into consideration.
You should take all factors into consideration and additionally bear in mind that all benchmarks are flawed. The only way to truly test it out is to use real algorithms in a production environment. By using this benchmark code I am focusing on one particular and simplified problem.
Of course, as in this case, it is not always reasonably possible to port a sufficiently complex and realistic problem to all benchmark targets. So it is typical to pick a trivial, but computationally intense problem that can easily be implemented in all of the benchmark subjects.
Now that I have addressed some of the fundamental assumptions of benchmarks; let’s meet the contenders!
We’ll begin by introducing Facebook’s PHP runtime, which has been receiving a lot of attention recently. The project is interchangeably known as HHVM, HipHop VM and less frequently HipHop Virtual Machine.
Facebook originally created HipHop VM to replace HPHPc which was their PHP to C++ compiler. They also sought to speed up their application infrastructure through the use of just in time compilation. More recently they have put a lot of effort into improving compatibility with existing PHP libraries (including Idiorm and Paris).
This means that it is now possible to run many PHP applications directly on HHVM and take advantage of its JIT (just in time compiler) to increase the speed of code. There are still some rough edges and some aspects that will not work, but there are regular commits on the project from the core team. It would seem that eventually HHVM will become fully compatible parallel runtime for PHP code.
PHP 5.5 and OPcache
PHP and Zend have also been busy trying to make PHP faster and with the advent of OPcache they have shaved between 10% to 20% off of PHP processes. It is a modern replacement for the bytecode cache features of the APC PECL extension. Unlike APC it doesn’t have the userland memory key/value store and it is entirely focused on the caching and optimisation of PHP code.
I am not au fait with the techniques that these caches use, but reading through the available documentation I found the following high level explanation. Zend OPcache offers increased "performance by storing precompiled bytecode in shared memory" to reduce reads from disk. Additionally it will apply a number of "bytecode optimization patterns" to the code decreasing execution times.
In a separate effort and taking a different direction the team behind the Phalcon framework for PHP have been working on Zephir. Phalcon is a web application framework written in C and bound as a PHP extension with the aim of being the fastest framework for PHP. It is worth mentioning here that Phalcon certainly is not the first to take this approach with Yaf existing before it’s inception.
With the pursuit of speed however there are usually some trade-offs and in the case of Phalcon it is difficult for PHP developers to understand the source code in the framework. This hampers adoption and also makes it difficult to encourage community contributions to the project. So they’re in the process of re-writing Phalcon on top of their own language called Zephir.
Zephir is a fairly simple language that is sort of a mixture of PHP and some aspects of C. When compiled Zephir converts it’s code into the lower level C underpinnings and the resulting library is installed as a PHP extension. In some ways it is a little bit like a cross between a PECL extension and Facebooks old HPHPc project.
This benchmark uses the Mandelbrot Set fractal as it’s algorithm of work for no particular reason other than the fractals look pretty when compiled. It is reasonably intensive computationally and does take a little bit of time for all test subjects to complete.
I did however make a number of changes - for example the code now writes to a stream rather than directly to STDOUT and it can produce ASCII art interpretations or Portable Bitmap binary files. In these benchmarks all the programmes are set to create ASCII and print the result out to STDOUT.
To test out the various ways of creating faster PHP I have implemented the algorithm in the following ways:
- Plain C
- Plain PHP
- PHP Extension (just like PECL)
- HHVM’s HACK/PHP++/PHQ
- HHVM Extension
- Zephir CBLOCK (C code dumped inside Zephir)
- Zephir Optimiser (C code accessible to Zephir - kinda like an extension)
- Plain Zephir Lang
If you’re interested in finding out more about the code used to produce these sets of statistics then you can checkout the repository for the code on GitHub.
The command line problem
Sara Golemon has graciously got in touch to help me cover an important caveat in the benchmarking in this article. If you didn’t already know Sara is on the HHVM team at Facebook and she has written PHP internals, extensions, articles and the Extending and Embedding PHP book.
She has prepared the following section that describes some of the possible short-comings in my method of benchmarking these PHP run times.
tl;dr: most PHP is run in the server where start up and shutdown times do not directly affect the run times in the way they do in CLI tests like I have used.
Every test run in this suite was executed on the command-line with a fresh process environment for each. This inherently biases the results in favor of pre-compiled solutions first, and strongly against a multi-threaded JIT based approach. In the real world, using a long-running webserver, these startup costs would disappear in the noise and we could focus solely on the per-request time.
In the case of PHP, the script must be recompiled to bytecode on every invocation since the bytecodes are not saved to disk. Worse, in fact, with an OpCache forcibly enabled (which it’s not normally for CLI), we must then make a second copy of those bytecodes into shared memory (shared only with ourselves) before execution can begin.
Similarly, in the case of HHVM, forcibly turning on the JIT incurs extra startup cost (though with a runtime benefit) since the script must be compiled from PHP to bytecode*, then from bytecode into machine code. For short-running scripts, the extra compilation time is often worse than the JIT savings, so it’s disabled from the command line by default.
A proper comparison of these technologies would require a warmed up multi-request environment, probably with each running as a FastCGI server using a basic fcgi client over unix socket to reduce the overhead of making the request.
Bottom line, these results are like most benchmarks: Only as good as the methodology.
* Normally HHVM caches the bytecode compilation to disk, however this test suite may be negating that due to the cache being inadvertently deleted or overwritten.
If anyone wants to help to improve the tests then please submit any pull requests on the GitHub project. This project was a way for me to play with all the elements and I was under no illusions that there would be potential for improvement.
As I tried to make clear in this article you shouldn’t take these results as imperical or even as givens. Benchmarks test isolated things and will not be directly applicable to your situation. Unless you’re calculating the Mandelbrot set then these results are but a possible indication of a trend.
So how does PHP 5.5 with OPcache compare with Facebook’s Hip Hop VM or Phalcon’s Zephir project?
From here on out I will be addressing this question in terms of outright speed only in terms of seconds elapsed. During the benching I did grab other statistics from the processes, but I will leave these for later discussion.
When actually performing the benchmarks I used a very simple system to account for variance between runs of the same code. Instead of just running each set of code once I ran them for 1, 20 and 40 iterations. This then allowed me to take an average of the results and therefore hopefully arriving at a fairer figure.
In addition to the tests that have already been mentioned I also tested each HHVM item with the JIT on and off with the same going for PHP and it’s OPcache.
The machine I am running the benchmarks on is a Intel® Core™ i3 CPU 530 @ 2.93GHz x 4 with 8GB of RAM with a installation of Linux Mint for OS. The versions of the relevant software are PHP 5.5.8, Zephir (fc08fab1e - Feb 3 2014) and HHVM (55212b92 - Jan 21 2014).
To time each run I simply used the Linux utility time, which can also gather other information such as processor load for the task and memory usage.
The results are in
To be completely honest the results are not really shocking or that different to what you would expect I imagine. Going into this testing I already an order in my mind and an idea of the orders of magnitude that might exist between the various techniques. Needless to say I was so close to right as to make this benchmark almost entirely pointless.
With this particular benchmark Zephir is the outright fastest, followed by HHVM and then PHP where they are all set to use optimum speed. If you would like to see all the statistics up close then you can checkout the graphs I prepared using D3 or read the raw CSV data dumps.
Based on this I would make the following loose recommendations:
- You need outright speed = standard PHP C extension
- You need non-C programmers to help maintain it = Zephir Lang
- You cannot port away from PHP but can install other runtimes = HHVM
- Your only deployment path is PHP = setup and enable the OPcache
As I mentioned previously; none of this is actually that much of a shock or a departure from the recommendations I already had in my mind. Additionally, you should of course do your own testing/benching using your specific domain model and not just take my word for it!
I should point out here that Zephir lang is not as simple as it may seem and at times the syntax can make problems harder to express. I also found myself regularly having to look at the compiled C source code output to debug the operations within my code. This is something that would be difficult for someone with no prior knowledge of C or the PHP extension architecture.
Zephir is a nice project and it does bring with it a number of performance advantages, but I would agree with the project maintainers that it is not a general purpose language (at least not yet). For the time being it is more focused on the issues it was built to solve in the Phalcon project.
One of the unexpected outcomes was to be from HHVM with it’s JIT free run coming dead last. According to Facebook their HHVM without the compilation should run at the same speed as PHP. In my testing it was much much slower.
It is therefore worth noting here that by default HHVM does not JIT scripts when it is run from the command line. So if you do find yourself regularly running HHVM CLI scripts then don’t forget to set the appropriate flag: -vEval.Jit=1.
Running your own benchmarks
If you want to gather your own statistics in the same way that I have done above then you can use my code. It is all up on GitHub under a liberal 3-clause BSD licence.
In the repository you’ll find a handy readme that goes some way to explaining how it all works and how to compile and run all the code yourself. In some of the subdirectories there are also readmes that are related solely to that technique.
It would be pretty easy to add your own benchmarking algorithm or test out various other techniques using the loose benching "framework" I have thrown together here.
Should you find any bugs or have any suggestions for improvement then please report them on the repository issue tracker.