Once in a while you come across situations where someone wants to know what a server can do or how many requests it can handle under a realistic load scenario. It could simply be that you want to hit a large selection of sites or even that you want to simultaneously hit a number of different pages on the same site.

In my case I am testing the performance of a Drupal multisite installation where one core set of code is shared by many sites on different URLs. I wanted to find out how many simultaneous requests the server would be able to handle when key URLs in each of the sites were interacted with. In production the respective load on each site was estimated to be approximately the same which made it easier as I can just replicate the same scenario on each site/URL.

This can be difficult to achieve as you need to simulate traffic across a number of website URLs as many of the benchmarking tools, including ApacheBench, do not support this. Whilst ApacheBench can perform concurrent requests to one URL it cannot do the same across a number of URLs or domains.

A way to work around this limitation is to make use of Ole Tange’s GNU parallel utility. When piped a list of arguments parallel will execute a command against them concurrently (generally limited by the number of physical CPUs where on job maps to one processor). This means you can take any crusty Linux utility that runs serially and turn it into a concurrently executed task with minimal effort. (If you are using parallel with gzip though you might prefer pigz instead.) On top of that it is also possible to farm out the parallel processing to other machines if you have them.

If you are paying attention then you will probably have noticed that I just described a rudimentary botnet. You could take a server of limited resources offline in a DDOS (Distributed Denial of Service) style attack using the method I am going to describe here. Please use this knowledge to interact with networks and hardware you own though and be responsible. Of course this is not the most efficient or practical means of performing such an attack anyway so you would be wasting your own time as well as your targets time.


You could easily bring your own site offline or piss off your hosting provider so do not actually execute this against a URL without being sure of the consequences first. Basically, do not run it against your production server!

So now I have the obligatory warning out of the way we can get on with the good stuff.

If you are running Ubuntu (like me) or Debian then GNU parallel is really easy to install (other distros may be easy too - I have not tested).

sudo apt-get install parallel

In addition, if you have not already installed it, you will have to install ApacheBench (often known simply as ab).

sudo apt-get install apache2-utils

Should you have a number of servers you want to network the jobs out to then also perform the same installation steps above on them too. If you do not install parallel on the other hosts then the process will only have access to one CPU core. It is also important to note here that parallel uses SSH communicate with the other machines so you will want to setup password-less login on those machines - I previously wrote about this in Securing SSH with Key Based Authentication.

A simple ApacheBench test might be to make 100 requests with up 10 of those occurring concurrently at any one time. This simple test should be easy for most webservers to shrug off.

ab -n 100 -c 10 "http://www.example.org"

You will be given a report back from ApacheBench containing all the vital stats of the benchmark run. As you can see this gives you concurrent tests on one URL or domain so this is where parallel will step into parallelize the benchmarking process.

(echo "http://www.example.org"; echo "http://www.example.com") | parallel 'ab -n 100 -c 10 {}'

This command pipes two URLs into the parallel utility which then fires up a process running ab -n 100 -c 10 against one of the URLs. The results will be printed to screen in the order that the jobs are completed and not necessarily in the order the URLs are specified. This is normal and to be expected when working with parallel implementations, but it might seem strange at first.

You might be wondering what the weird {} near the end of the command means and why it might be needed. parallel uses the same syntax as xargs for handling the argument substitution in the command. In this case when the ab command is run parallel will substitute the token {} for the URL it has been passed. There are a number of other options and things you can do such as substitution without a file extension {.}, which are described on the manual page.

It is not strictly necessary in the commands we are doing here as “…[where] the command line contains no replacement strings then {} will be appended to the command…”, but I like to be explicit should I later want to add any other arguments or options to the command.

If you want to maintain the order then you can simply pass parallel the -k switch as an argument.

(echo "http://www.example.org"; echo "http://www.example.com") | parallel -k 'ab -n 100 -c 10 {}'

As this is not really important though further examples will omit this parameter for brevity.

Another option you may wish to tweak is the number of jobs you want parallel to run concurrently using the -j option. Normally this would be mapped to the number of cores you have available on your machine (-j+0), but you can change it as you see fit. The following code would be limited to just two concurrent processes.

(echo "http://www.example.org"; echo "http://www.example.com") | parallel -j2 'ab -n 100 -c 10 {}'

What if you have more than two URLs to test though? You could keep chaining calls to echo for each one, but that would be a pain. The easiest method is to put all your target URLs into a text file with each URL on a newline like my URLs.txt file below.


This can then easily be passed into parallel through the use of the cat utility.

cat URLs.txt | parallel 'ab -c 10 -n 100 {}'

Results for all four URLs will then be printed to screen. To speed the whole thing up we can employ other machines to handle some processing as I mentioned previously. This can be performed with a simple list of IP addresses or hostnames provided to parallel with the -S option. Do not forget that parallel communicates via SSH so you will need to setup password-less access to each of these servers before continuing - I previously wrote about this in Securing SSH with Key Based Authentication.

cat URLs.txt | parallel -S,,: 'ab -c 10 -n 100 {}'

The colon (:) at the end of the list specifies that I also want the job to run on the local machine too.

To make the process more transparent it is also possible to get parallel to generate an ETA and some server usage statistics with the --eta switch.

cat URLs.txt | parallel --eta -S,,: 'ab -c 10 -n 100 {}'

This will cause parallel to output some additional information breaking down the percentage of jobs each server processed and the time it took.

ETA: 33s 8left 4.23avg  local:4/9/28%/6.2s

To increase the load on the server you can either add extra URLs to the text file or you can adjust the options passed to ab. In the following example -c and -n are increased tenfold.

cat URLs.txt | parallel --eta -S,,: 'ab -c 100 -n 1000 {}'

Now you have seen how parallel can help you perform many benchmarking requests against many URLs using ApacheBench I will throw in a little bonus. parallel can be used to run all the jobs on all the available computers so it could actually be used to roll out a change to all machines as part of system orchestration. It is not designed to do this, but it can! Another use might be performing a benchmark on all machines to determine the best.

To make our earlier example with ApacheBench run all jobs on all available machines it as simple as:

cat URLs.txt | parallel --onall --eta -S,,: 'ab -c 100 -n 1000 {}'

There is one design decision that Ole made here though; if you use --onall and also specify -j the value passed to the latter will be used to determine the number of machines to login into in parallel and not the number of jobs to run in parallel. This is an important distinction that the manual describes thus:

Run all the jobs on all computers given with -S. GNU parallel will log into -j number of computers in parallel and run one job at a time on the computer. The order of the jobs will not be changed, but some computers may finish before others.

Ole has also written about it on StackOverflow:

You are hitting a design decision: What does -j mean when you run –onall? The decision is that -j is the number of hosts to run on simultaneously (in your case 2). This was done so that it would be easy to run commands serially on a number of hosts in parallel.

To work round it he suggests wrapping the call to parallel with another call to parallel, which in our example would look like:

cat URLs.txt | parallel parallel -j2 --onall --eta -S,,: 'ab -c 100 -n 1000 {}'

There is so much more you can do with both ApacheBench and GNU parallel so you should have a quick look over their respective manuals and resources too.