Not wanting to repeat myself I have written a small bash script to handle the parallel processing of the post images for this site. This involves resizing, cropping and then compressing the images ready for the web. Currently the script supports both JPEG and PNG images for all these operations.

On top of this I wanted to ensure that only recently added or modified images would be processed rather than processing the entire folder again. There is a handy option for touch that we’ll see later that makes this process much easier.

So let’s work through the bash script to slowly build it up into a working example. The first item on the agenda is to declare the hashbang for the script.

#! /usr/bin/env bash

Here we are using env to locate the bash executable - this should help to make the script more portable between systems rather than hard referencing /usr/bin/bash directly. Some systems might have bash in /bin/bash for example and using env will prevent this from breaking our script.

Now the script can begin in earnest by declaring a few variables to store the width and heights we want the final images to be. A temporary file path is also required to store the last run timestamp to prevent re-processing the same image twice.

TH_WIDTH=720
TH_HEIGHT=70

LG_WIDTH=720
LG_HEIGHT=480

TOUCH_FILE="last.run.time"

Across the article I will refer to thumbnail, TH and list image interchangeably - same goes for large, LG and post image.

If the touch file doesn’t exist then we need to create it and specify the timestamp to use as it’s default. As I am tracking the entire project in git the last git commit date will do for the default date. This will prevent any already committed in images from being run again.

if [ ! -f "$TOUCHFILE" ]; then
    # http://stackoverflow.com/a/19812608/461813
    LAST_COMMIT_TIMESTAMP=$(git show -s --format=%ct)
    # http://unix.stackexchange.com/a/36765/10219
    touch -d "@$LAST_COMMIT_TIMESTAMP" "$TOUCHFILE"
fi

There is one slight caveat here - if you clone the git repository then all the files will have a modification time of the clone date and not their original resize date. Therefore the resizing will be run against all images on initial clone. This is not an issue for me as I will rarely clone the repo - if it is for you then you could get the latest modification time across all the files and use that instead.

All of the images we wish to resize are stored in a directory called src so we need to find all the files in there that have a more recent modification time than the touch file. find has a handy switch -newer that will allow us to easily locate them.

FILES=$(find src -newer "$TOUCHFILE" -iname '*.jpg' -or -newer "$TOUCHFILE" -iname '*.png')

This will find all files that are newer than the touch file and that have either .jpg or .png extensions. If there are any then we want to resize and crop them to the correct dimensions using ImageMagick’s convert utility. To complicate this we’re also going to using GNU parallel to process the images across processors.

If you haven’t used parallel before it is probably worth checking out my other post to get an idea of the syntax and opportunities it provides.

To test that there are some files to process we can simply test it with the -n switch.

if [ -n "$FILES" ]; then
    # process the large images
    parallel -j8 convert "{}" -strip -resize "${LG_WIDTH}x${LG_HEIGHT}^" -gravity center -crop "${LG_WIDTH}x${LG_HEIGHT}+0+0" -filter catrom "t_post/{/}" ::: $FILES

    # process the image slices
    parallel -j8 convert "t_post/{/}" -gravity center -crop "${TH_WIDTH}x${TH_HEIGHT}+0+0" -filter catrom -extent "${TH_WIDTH}x${TH_HEIGHT}" +repage "t_list/{/}" ::: $FILES
fi

The cropping and resizing particulars can be researched in the ImageMagick manual so I won’t spend too much time covering it here. Note that the parallel utility uses the same syntax (pretty much) as xargs where the file names are passed into convert - as detailed in my previous post. Also note how $FILES is passed into the parallel command as an argument after the special ::: blockade.

So in the first call to parallel you can see {} being used - that is the file name/path as it is passed back from find without modification. You’ll see it used else where with {/}, which will be the same as {} except that it strips the preceeding path from the argument before printing it (eg. /var/www/index.html becomes index.html). You can also strip the extension from the argument with {.} giving /var/www/index when fed /var/www/index.html. Finally you can also combine the two; {./} produces index when given the same.

As the thumbnail quality is less important than the actual large image I have cheated a little performed the second crop and resize on the large image rather re-cutting from the src. This has two purposes; it is quicker to process a smaller image and it means the image is already at the correct width.

So now we have resized and cropped both our large and thumbnail image - it is time to compress them. Before we get into that however now is a good time to go over the required dependencies and how to install them. I have wrapped them all up into installation bash script you can use at the end of this of article too.

Handily some of the requirements can be obtained from Ubuntu/Debians’s repositories.

sudo apt-get install imagemagick optipng advancecomp parallel

This gives you the ImageMagick package to do the resizing and cropping, two PNG optimisation tools and GNU parallel to handle the multi-processor usage.

Compressing JPEGs nicely takes a little more work as we must manually compile the dependencies here - not at all hard I promise! To facilitate compilation we need to install some build tools from the repositories.

sudo apt-get install build-essential autoconf pkg-config nasm libtool git

With these in place we can turn our attention to mozjpeg which sits under our final library jpeg-archive.

git clone https://github.com/mozilla/mozjpeg.git
cd mozjpeg
autoreconf -fiv
./configure --with-jpeg8
make
sudo make install
cd -

Now that has been built and installed it is possible to jpeg-archive up and running with another simple build script.

git clone https://github.com/danielgtaylor/jpeg-archive.git
cd jpeg-archive
git checkout 2.1.1
make
sudo make install
cd -

After the dependencies are available we can get on with process of compressing the resized and cropped image files. It is essential that different file types are handled differently here. You cannot compress a PNG with the same tools as a JPEG and vice versa. Additionally I want to compress the thumbnail/list images more than the large/post images.

Let’s begin with handling the JPEG results first.

JPOST_FILES=$(find t_post -newer "$TOUCHFILE" -iname '*.jpg')
JLIST_FILES=$(find t_list -newer "$TOUCHFILE" -iname '*.jpg')

The next step is to loop over these results in parallel and apply the compression tools we installed earlier.

if [ -n "$JPOST_FILES" ]; then
    parallel -j8 jpeg-recompress --method smallfry --quality medium --min 60 "{}" "{}" ::: $JPOST_FILES
fi
if [ -n "$JLIST_FILES" ]; then
    parallel -j8 jpeg-recompress --method smallfry --quality low --min 50 "{}" "{}" ::: $JLIST_FILES
fi

From the jpeg-archive suite the above code is jpeg-recompress to perform the compression using the so called smallfry algorithm/technique. As you can see the thumnail/list and large/post images are handled separately and the options passed to the list jpeg-recompress are far more severe.

PNGs are simpler, because they’ve not got the same level of compression options. We’re going to use a PNG optimiser followed by a compressor/reducer (GZIP underneath essentially).

PNG_FILES=$(find t_post t_list -newer "$TOUCHFILE" -iname '*.png')
if [ -n "$PNG_FILES" ]; then
    parallel -j8 optipng -o 3 -fix "{}" -out "{}" ::: $PNG_FILES
    parallel -j8 advdef --shrink-extra -z "{}" ::: $PNG_FILES
fi

Together these two utilities will shave something like 10% or so off of a PNG image in my limited experience with 10 or so images.

With all the actual operations now complete it just remains to update the last.run.time file to prevent the same images being run over twice.

touch "$TOUCHFILE"

Simple! So, yes, it took some work to get here, but you’ve now got repeatable and efficient image manipulation with a small and easily modified bash script.

To make it easier to copy and paste and verify your final result the full installation and resize scripts are included below.

resize.sh

#! /usr/bin/env bash
LG_WIDTH=720
LG_HEIGHT=480

TH_WIDTH=720
TH_HEIGHT=70

TOUCHFILE="last.run.time"

if [ ! -f "$TOUCHFILE" ]; then
    # http://stackoverflow.com/a/19812608/461813
    LAST_COMMIT_TIMESTAMP=$(git show -s --format=%ct)
    # http://unix.stackexchange.com/a/36765/10219
    touch -d "@$LAST_COMMIT_TIMESTAMP" "$TOUCHFILE"
fi

echo "Resizing in post images"
FILES=$(find src -newer "$TOUCHFILE" -iname '*.jpg' -or -newer "$TOUCHFILE" -iname '*.png')

if [ -n "$FILES" ]; then
    # process the large images
    parallel -j8 convert "{}" -strip -resize "${LG_WIDTH}x${LG_HEIGHT}^" -gravity center -crop "${LG_WIDTH}x${LG_HEIGHT}+0+0" -filter catrom "t_post/{/}" ::: $FILES

    # process the image slices
    parallel -j8 convert "t_post/{/}" -gravity center -crop "${TH_WIDTH}x${TH_HEIGHT}+0+0" -filter catrom -extent "${TH_WIDTH}x${TH_HEIGHT}" +repage "t_list/{/}" ::: $FILES
fi

# compress jpg images
JPOST_FILES=$(find t_post -newer "$TOUCHFILE" -iname '*.jpg')
JLIST_FILES=$(find t_list -newer "$TOUCHFILE" -iname '*.jpg')
if [ -n "$JPOST_FILES" ]; then
    parallel -j8 jpeg-recompress --method smallfry --quality medium --min 60 "{}" "{}" ::: $JPOST_FILES
fi
if [ -n "$JLIST_FILES" ]; then
    parallel -j8 jpeg-recompress --method smallfry --quality low --min 50 "{}" "{}" ::: $JLIST_FILES
fi

# compress png images
PNG_FILES=$(find t_post t_list -newer "$TOUCHFILE" -iname '*.png')
if [ -n "$PNG_FILES" ]; then
    parallel -j8 optipng -o 3 -fix "{}" -out "{}" ::: $PNG_FILES
    parallel -j8 advdef --shrink-extra -z "{}" ::: $PNG_FILES
fi

echo " "
echo "Completed resize operation"
touch "$TOUCHFILE"

install.sh

echo "Installing imagemagick"
sudo apt-get install imagemagick

echo " "
echo "Installing optipng and advdef"
sudo apt-get install optipng advancecomp

echo " "
echo "Installing gnu parallel"
sudo apt-get install parallel

echo " "
echo "Installing mozjpeg"
sudo apt-get install build-essential autoconf pkg-config nasm libtool
git clone https://github.com/mozilla/mozjpeg.git
cd mozjpeg
autoreconf -fiv
./configure --with-jpeg8
make
sudo make install

cd -

echo " "
echo "Installing jpeg-archive"
git clone https://github.com/danielgtaylor/jpeg-archive.git
cd jpeg-archive
git checkout 2.1.1
make
sudo make install