In the last article we learnt how to create modified copies of an immutable in PHP. This one is going to tackle an issue I have hitherto skirted around and avoided. Objects in immutable data structures.
This article is part of a series I have written on the topic of immutability in PHP code:
- Part one - a discussion of caveats and a simple scalar handling immutable
- Part two - improve the process of creating modified copies of the immutable
- Part three - objects in immutable data structures and a generalised immutable implementation
Also available in Русский (Russian):
What’s the problem with objects?
Objects or instances of classes are passed by reference in PHP. Any changes to the class will be reflected in all places it is passed to. This is different to scalar values like strings, that are passed by value instead.
$class = new stdClass();
function addItem($x, $item) {
$x->$item = $item;
}
var_dump($class); // object(stdClass)#1 (0) {}
addItem($class, 'test');
var_dump($class);
/*
object(stdClass)#1 (1) {
["test"]=> string(4) "test"
}
*/
Here you can see a function called addItem()
that adds a property to stdClass
instance - this
produces a side effect. The original $class
is also updated as it references the same value
so if we dump the variable we can see it’s value has changed.
Now consider the same example with a simple scalar string where pass by value takes effect.
$string = 'begin';
function addItem($x, $item) {
$x .= $item;
}
var_dump($string); // string(5) "begin"
addItem($string, 'end');
var_dump($string); // string(5) "begin"
Here the original value remains intact because, unlike an object, there is no reference to it from
within the addItem()
function.
These side effects make putting an object into an immutable data structure difficult. Someone with access to the reference could simply change the object after the fact - thus breaking immutability.
What about resources?
Turns out the same issues plague resources as well. They are just references to a resource ID so any change to one will affect all those that also reference it. Simply moving the pointer in a file resource would break immutability.
$f = fopen('/tmp/test.txt', 'r'); // contains "123456789"
$out = fread($f, 3);
var_dump($out); // string(3) "123"
$out2 = fread($f, 3);
var_dump($out2); // string(3) "456"
This happens because fread()
advances the file pointer as it reads. Even if we do rewind()
the
pointer then it is no guarantee of getting the same value back.
An additional issue with resources is that they are, by their nature, not a finite thing so even if you did prevent changes within your program you could still end up having mutations - someone updating a file on disk for example.
$f = fopen('/dev/urandom', 'r');
$out = bin2hex(fread($f, 3));
var_dump($out); // string(6) "82e42b"
rewind($f); // reset pointer to beginning
$out2 = bin2hex(fread($f, 3));
var_dump($out2); // string(6) "e20c78"
In between the two calls to fread()
the data in the resource has changed through outside intervention.
A new random value has effectively been written to /dev/urandom
meaning the dumped value changes
too even though we have rewound the pointer and used the same offset/index of 3
.
Note, that the use of bin2hex()
converts the binary bytes that /dev/urandom
produces into a hexadecimal
representation making it more legible to humans. This conversion process also increases the length
of the value as �@D��N�
becomes dd4044f5f84ed6
in hexadecimal notation. This is why the offset maybe 3
,
but the string that comes back is actually 6
characters long.
However, if your data source is not binary then you do not need to use bin2hex()
in your code.
What can we do to fix it?
In the case of resources, it is too hard to protect them from unauthorised changes so we won’t bother. If you need an immutable resource you’ll have to fetch it as a scalar first and then put that into your immutable data structure.
$f = fopen('/dev/urandom', 'r');
$randomStr = bin2hex(fread($f, 7));
var_dump($randomStr); // string(14) "d102c7ca28b6f1"
var_dump(substr($randomStr, 0, 3)); // string(3) "d10"
var_dump(substr($randomStr, 0, 3)); // string(3) "d10"
As you can clearly see the value does not change between prints in this example because we are accessing
a scalar string instead of a resource directly. You could just as easily feed the $randomStr
into the Immutable
definitions that are described further on.
In the case of objects though there is something we can do to protect the immutable from their pass by reference nature. For simple objects you can simply clone the incoming object value when setting it in an immutable data structure. This will create a new copy of the object with its own reference and, therefore, break the dependency on the previous reference - the two objects are not linked by reference. Any change in one will not be reproduced in the other.
declare(strict_types=1);
final class Immutable {
private $data;
private $mutable = true;
public function __construct(stdClass $value) {
if (false === $this->mutable) {
throw new \BadMethodCallException('Constructor called twice.');
}
$this->data = clone $value;
$this->mutable = false;
}
public function get() {
return $this->data;
}
}
$test = new stdClass();
$test->data = 'test';
echo $test->data; // test
$imm = new Immutable($test);
echo $imm->get()->data; // test
$test->data = 'simon';
echo $test->data; // simon
echo $imm->get()->data; // test
By cloning the object we have created a duplicate instance and referenced that instead from within
the Immutable
. This means that when $test
is later updated it does not affect the value inside
$imm
as it is does not have the same reference as $test
.
So, there it is, we are done.
Deep nesting though
Yeah, right, not so fast! The previous example can easily be broken with one small change; provide an object for storage inside $test
.
$value = new stdClass();
$value->data = 'value';
$test = new stdClass();
$test->data = $value;
var_dump($test->data);
/*
object(stdClass)#1 (1) {
["data"]=> string(5) "value"
}
*/
$imm = new Immutable($test);
var_dump($imm->get()->data);
/*
object(stdClass)#1 (1) {
["data"]=> string(5) "value"
}
*/
// change the nested object's value to see if the immutable changes too
$value->data = 'changed value!';
var_dump($imm->get()->data);
/*
object(stdClass)#1 (1) {
["data"]=> string(14) "changed value!"
}
*/
As you would expect just because we cloned $test
when it is set inside Immutable
does not mean
its contents are cloned too. Unfortunately, $value
is still referenced directly, so any subsequent updates
get reflected across all referring locations - including inside our Immutable
.
The same would be true of any immutable containing an array too. You could just set one of the
array elements to be an object and change it later just like $value
in this object example.
Long story short, this immutable is in fact mutable.
Immutable deep nesting with __clone()
You could work around the lack of protection by implementing the __clone()
magic method in all
classes that might be put inside an immutable. You could then clone all objects stored in the class
when it, itself, is cloned. A simplified demonstration of how this could work is below.
declare(strict_types=1);
final class Immutable {
private $data;
private $mutable = true;
public function __construct(MySimpleClass $value) {
if (false === $this->mutable) {
throw new \BadMethodCallException('Constructor called twice.');
}
$this->data = clone $value;
$this->mutable = false;
}
public function get() {
return $this->data;
}
}
class MySimpleClass {
private $data;
public function __construct(stdClass $value) {
$this->data = $value;
}
public function get() {
return $this->data;
}
public function __clone() {
$this->data = clone $this->data;
}
}
$stdClass = new stdClass();
$stdClass->value = 'Hello';
var_dump($stdClass);
/*
object(stdClass)#1 (1) {
["value"]=> string(5) "Hello"
}
*/
$toBeStored = new MySimpleClass($stdClass);
var_dump($toBeStored->get());
/*
object(stdClass)#1 (1) {
["value"]=> string(5) "Hello"
}
*/
$imm = new Immutable($toBeStored);
var_dump($imm->get()->get());
/*
object(stdClass)#5 (1) {
["value"]=> string(5) "Hello"
}
*/
As you can see MySimpleClass
is very naive to make the demonstration easier to grasp. You will also
note that the object ID jumps to 5 when the final var_dump()
is applied - this is because __clone()
in MySimpleClass
was triggered.
If we step through the implementation again and attempt to make a change to $stdClass
then it might
be clearer.
$stdClass = new stdClass();
$stdClass->value = 'Hello';
var_dump($stdClass);
/*
object(stdClass)#1 (1) {
["value"]=> string(5) "Hello"
}
*/
$toBeStored = new MySimpleClass($stdClass);
// we can still modify this object as the clone has not yet happened
$stdClass->data = 'World';
var_dump($toBeStored->get());
/*
object(stdClass)#1 (2) {
["value"]=> string(5) "Hello"
["data"]=> string(5) "World"
}
*/
// the clone will be triggered by the constructor in Immutable right here
$imm = new Immutable($toBeStored);
// Note that the following line returns a different object (#5 instead of #1)
// due to the clone operation in the Immutable constructor
var_dump($imm->get()->get());
/*
object(stdClass)#5 (2) {
["value"]=> string(5) "Hello"
["data"]=> string(5) "World"
}
*/
// the following line will not affect the Immutable wrapped data as $stdClass references
// the original #1 object
$stdClass->combined = $stdClass->value . $stdClass->data;
var_dump($imm->get()->get());
/*
object(stdClass)#5 (2) {
["value"]=> string(5) "Hello"
["data"]=> string(5) "World"
}
*/
Unfortunately, this would require you to trust developers to actually implement this correctly and
there would be no way of accurately verifying that a __clone()
method has been specified properly.
To solve this issue we must eschew quite a bit of flexibility and only allow known immutable
objects to be set inside the Immutable
. This means that we have to recursively step down through
any arrays looking for mutable classes and rejecting them too.
Generalised immutable deep nesting
For those of us who want a more stringently protected immutable we can generalise the problem by making an immutable class that can sanitise itself. It will only allow known immutables to be set as data inside it thereby preventing nested object state changes, which would break its immutable property.
declare(strict_types=1);
final class Immutable {
private $data;
private $mutable = true;
public function __construct(array $args) {
if (false === $this->mutable) {
throw new \BadMethodCallException('Constructor called twice.');
}
$this->data = $this->sanitiseInput($args);
$this->mutable = false;
}
public function getData(): array {
return $this->data;
}
public function sanitiseInput(array $args): array {
return array_map(function($x) {
if (is_scalar($x)) return $x;
else if (is_object($x)) return $this->sanitiseObject($x);
else if (is_array($x)) return $this->sanitiseInput($x);
else throw new \InvalidArgumentException(gettype($x) . ' cannot be stored in an Immutable.');
}, $args);
}
// This method prevents untrusted objects from being set using a type hint
// in combination with the declare(strict_types=1) at the top of the file.
// Note that it also clones the supplied object.
private static function sanitiseObject(Immutable $object): Immutable {
return clone $object;
}
public function __clone() {
$this->data = $this->sanitiseInput($this->data);
}
public function __unset(string $id): void {}
public function __set(string $id, $val): void {}
}
This class can then be implemented to create immutable lists of things.
$immA = new Immutable([1, 'unjani wena']);
var_dump($immA);
/*
object(Immutable)#1 (2) {
["data":"Immutable":private]=> array(2) {
[0]=> int(1)
[1]=> string(11) "unjani wena"
}
["mutable":"Immutable":private]=> bool(false)
}
*/
$immB = new Immutable([2, $immA]);
var_dump($immB);
/*
object(Immutable)#2 (2) {
["data":"Immutable":private]=> array(2) {
[0]=> int(2)
[1]=> object(Immutable)#4 (2) {
["data":"Immutable":private]=> array(2) {
[0]=> int(1)
[1]=> string(11) "unjani wena"
}
["mutable":"Immutable":private]=> bool(false)
}
}
["mutable":"Immutable":private]=> bool(false)
}
*/
$immC = new Immutable([2, new stdClass]);
// Error: Argument 1 passed to Immutable::sanitiseObject() must be an instance of Immutable,
// instance of stdClass given
The main new concept here is the recursive method sanitiseInput()
, which recursively steps through the data
array cloning any objects it finds. This is completed in sanitiseObject()
that you will also note, uses a type
hint to ensure only instances of Immutable
can be set as values. This is how we ensure that only known immutable
objects are being set inside an Immutable
.
If you need to check for more than one known immutable class then you could check in a number of ways:
- extend a base or abstract class when implementing them all,
- use an interface that they all implement or
- a simple set of
instanceOf
checks.
Something like this might do it.
/**
* @param Immutable|MyOtherImmutable|SomeOtherImmutable $object
*/
protected function sanitiseObject($object) {
if (array_filter(
['Immutable', 'MyOtherImmutable', 'SomeOtherImmutable'],
function($x) use ($object) { return $object instanceOf $x; }
)) {
return clone $object;
}
throw new \InvalidArgumentException(gettype($x) . ' cannot be stored in an Immutable.');
}
Whichever way you choose or prefer is up to you of course.
So, that finally gives us a simple immutable structure that can store objects, scalars and arrays. You can use the techniques discussed in the previous article (part two) to easily create modified copies of your new immutable.
Using a generator to make generalisation easier
The same functionality can also be written using a generator class to create the immutable data structure. In this section though we are going to be extending the idea just a little further to add some convenience methods.
The data structure
Turning to the structure itself, we are going to add a few methods that will make data access more robust in the
generalised class. To this end, it is useful to know if a value exists so we are going to add a has($key)
method.
This will also be used by a getOrElse($key, $default)
function to allow a default value to be provided where a
key does not already exist.
declare(strict_types=1);
final class ImmutableData {
private $data = [];
private function __construct() {}
public static function create(array $args): ImmutableData {
$immutable = new self;
$immutable->data = static::sanitiseInput($args);
return $immutable;
}
public function has($key) {
return array_key_exists($key, $this->data);
}
public function get($key) {
return $this->data[$key];
}
public function getOrElse($key, $default) {
if($this->has($key)) {
return $this->get($key);
}
return $default;
}
public function getAsArray(): array {
return $this->data;
}
protected static function sanitiseInput(array $arr): array {
return array_map(function($x) {
if (is_scalar($x)) return $x;
else if (is_object($x)) return static::sanitiseObject($x);
else if (is_array($x)) return static::sanitiseInput($x);
else throw new \InvalidArgumentException(gettype($x) . ' cannot be stored in an Immutable.');
}, $arr);
}
protected static function sanitiseObject(ImmutableData $object): ImmutableData {
return clone $object;
}
// return a parsable text representation of the class
public function __toString(): string {
return var_export($this->getAsArray(), true);
}
// called when a var_export'd class is parsed
public function __set_state(array $args): ImmutableData {
return static:create($args);
}
public function __unset($a): void {}
public function __set($a, $b): void {}
private function __clone() {
$this->data = static::sanitiseInput($this->data);
}
}
This is the complete immutable structure that our generator will populate for us.
Unlike the last Immutable
this one makes use of static methods and prevents access to the class constructor
by making it a private method. This skips the $mutable
true/false dance we have been doing elsewhere. I prefer the
dance, but this serves as a nice example of another method to achieve a similar result.
You will notice that there are actually a few other methods in there that we have not discussed yet. There is a
get($key)
that allows us to access a value by its key easily and getAsArray()
has taken over the duties of
returning the complete $this->data
array. Finally, there is a toString()
method, which produces a PHP parsable
string representation of the stored data.
A generator in detail
Now onto the generator that will produce the populated instances of the ImmutableData
class.
The main aim of this generator is to make it as generalised as possible - allowing a consumer to store the widest selection of types and values as possible whilst ensuring immutability is not broken. In tandem with this we will also add some methods to make modifying a copy of the immutable easier.
All the data will be stored in an array internally to easily facilitate different data shapes that may be thrown
at the Immutable
class.
All data will need to be stored against a key so that it can be accessed again easily.
class Immutable {
private $data = [];
private function __construct() {}
public static function create(): self {
return new self;
}
public static function with(ImmutableData $old): self {
$new = static::create();
$new->data = $old->getAsArray();
return $new;
}
public function set(string $key, $value): self {
return $this->setData($key, $value);
}
public function unset($key): self {
unset($this->data[$key]);
return $this;
}
public function setIntKey(int $key, $value): self {
return $this->setData($key, $value);
}
private function setData($key, $value): self {
$this->data[$key] = $value;
return $this;
}
public function arr(array $arr): self {
foreach($arr as $key => $value) {
if (is_string($key)) {
$this->set($key, $value);
} else if (is_int($key)) {
$this->setIntKey($key, $value);
}
}
return $this;
}
public function unsetArr(array $arr): self {
foreach($arr as $key) {
$this->unset($key);
}
return $this;
}
public function build(): ImmutableData {
return ImmutableData::create($this->data);
}
public function getAsArray(): array {
return $this->data;
}
}
Again this class uses a private constructor and static method to prevent calls to the constructor. You could
use the $mutable
true/false setup here, very easily, if you wanted to though.
Simple usage
These two classes can now be used to generate an immutable data structure like so.
$immX = Immutable::create()
->set('test', 'a string goes here')
->set('another', 100)
->arr([1,2,3,4,5,6])
->arr(['a' => 1, 'b' => 2])
->build();
echo (string) $immX;
This uses the __toString()
method to print a simple and parsable text representation.
array (
'test' => 'a string goes here',
'another' => 100,
0 => 1,
1 => 2,
2 => 3,
3 => 4,
4 => 5,
5 => 6,
'a' => 1,
'b' => 2,
)
You can also put a trusted object into the immutable as well - in this case we will just use the immutable
we created earlier, $immX
.
$immY = Immutable::create()
->set('anObject', $immX)
->build();
echo (string) $immY;
Again, the output is parsable by the PHP engine so you will notice the slightly weird __set_state()
magic
method call in there - you can safely ignore this and concentrate on the data itself. This magic method is
implemented in the ImmutableData
class that we defined earlier and it merely serves to populate a class with
a set of data/state when a var_export()
output is parsed by PHP.
array (
'anObject' =>
ImmutableData::__set_state(array(
'data' =>
array (
'test' => 'a string goes here',
'another' => 100,
0 => 1,
1 => 2,
2 => 3,
3 => 4,
4 => 5,
5 => 6,
'a' => 1,
'b' => 2,
),
)),
)
So, what is the point if we cannot get our data out? Well, remember those get()
, has()
and getOrElse()
methods?
They can be used to quickly and relatively easily access the stored data by key. The methods are fairly self-explanatory
so here are a few examples just to demonstrate their usage against $immY
.
echo $immY->get('test'); // a string goes here
var_dump($immY->has('test')); // bool(true)
var_dump($immY->has('non-existent')); // bool(false)
echo $immY->getOrElse('test', 'some default text'); // a string goes here
echo $immY->getOrElse('non-existent', 'some default text'); // some default text
This should give you enough of a foundation to build additional functions like map, reduce, etc upon were you choose to do so. You could also write methods to fetch items by their value rather than their key as well.
Modifying copies of the immutable structure using the generator
The key to making immutables useful is allowing consumers to easily and quickly create modified copies of the
underlying data. This has been written into the generator we defined earlier and can be best described with
a few examples. Note that the with()
static method can accept an ImmutableData
object as its first
parameter and modification is exactly what this is for. You can then use set()
to add or modify values.
$immZ = Immutable::with($immY)
->set('a story', 'This is where someone should write a story')
->setIntKey(300, 'My int indexed value')
->arr(['arr: int indexed', 'arr' => 'arr: assoc key becomes immutable key'])
->build();
echo (string) $immZ;
In the result we should see our new properties added to the stored array from $immY
.
array (
'x' =>
ImmutableData::__set_state(array(
'data' =>
array (
'test' => 'a string goes here',
'another' => 100,
0 => 1,
1 => 2,
2 => 3,
3 => 4,
4 => 5,
5 => 6,
'a' => 1,
'b' => 2,
),
)),
'a story' => 'This is where someone should write a story',
300 => 'My int indexed value',
0 => 'arr: int indexed',
'arr' => 'arr: assoc key becomes immutable key',
)
Of course, you can also use arr()
or setInt()
here in the same way too when setting new values or overwriting
existing ones. Just set a value with a key that already exists in the structure and you will overwrite it.
$throwAway = Immutable::with($immZ)
->set('a story', 'My story begins by the slow moving waters of the meandering river.')
->build();
echo (string) $throwAway;
This would result in a data structure like the following.
array (
'x' =>
ImmutableData::__set_state(array(
'data' =>
array (
'test' => 'a string goes here',
'another' => 100,
0 => 1,
1 => 2,
2 => 3,
3 => 4,
4 => 5,
5 => 6,
'a' => 1,
'b' => 2,
),
)),
'a story' => 'My story begins by the slow moving waters of the meandering river.',
300 => 'My int indexed value',
0 => 'arr: int indexed',
'arr' => 'arr: assoc key becomes immutable key',
)
It is also used to remove items from the data list quite simply too. We can either remove them one at time with
unset($key)
or you can remove many by supplying a list to unsetArr()
.
$immAA = Immutable::with($immZ)
->unset('x')
->unsetArr(['a story', 300])
->build();
echo (string) $immAA;
The execution of this results in the following modified output where a number of keys have been removed.
array (
0 => 'arr: int indexed',
'arr' => 'arr: assoc key becomes immutable key',
)
You can unset()
, unsetArr
, set
, setIntKey
and arr
as much as you like before calling build()
all
in the one building chain.
Conclusion
Now you have a generalised immutable data structure that you can store anything you like in. If you have an untrusted
object you will need store it as a string using either serialize()
or var_export()
. The same goes for resources
like file handles where you will need to extract value as text before storing it.
Apart from these two caveats though, you are relatively free to use the immutable as you see fit.
This article is part of a series I have written on the topic of immutability in PHP code:
- Part one - a discussion of caveats and a simple scalar handling immutable
- Part two - improve the process of creating modified copies of the immutable
- Part three - objects in immutable data structures and a generalised immutable implementation
Also available in Русский (Russian):
If you like this article then you might get a kick out of writing functional php code as taught in the Functional Programming in PHP book that I wrote.