Python Pickle data serialisation

Python Pickle data serialisation is explained as the object which is to be stored or the open file which will contain the object. Learn more about it here.

Python Pickle data serialisation Parameter

object

file

protocol

buffer

Details

The object which is to be stored

The open file which will contain the object

The protocol used for pickling the object (optional parameter)

A bytes object that contains a serialized object

Python Pickle data serialisation: Using Pickle to serialize and deserialize an object

The pickle module implements an algorithm for turning an arbitrary Python object into a series of bytes. This process is also called serializing the object. The byte stream representing the object can then be transmitted or stored, and later reconstructed to create a new object with the same characteristics.

For the simplest code, we use the dump() and load() functions.

To serialize the object

import pickle
An arbitrary collection of objects supported by pickle. data = {
'a': [1, 2.0, 3, 4+6j],
'b': ("character string", b"byte string"),
'c': {None, True, False}
}
with open('data.pickle', 'wb') as f:
Pickle the 'data' dictionary using the highest protocol available. pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)

To deserialize the object

import pickle
with open('data.pickle', 'rb') as f:
The protocol version used is detected automatically, so we do not
have to specify it.
data = pickle.load(f)

Using pickle and byte objects

It is also possible to serialize into and deserialize out of byte objects, using the dumps and loads function, which are equivalent to dump and load.

serialized_data = pickle.dumps(data, pickle.HIGHEST_PROTOCOL)
type(serialized_data) is bytes
deserialized_data = pickle.loads(serialized_data)
deserialized_data == data

Customize Pickled Data

Some data cannot be pickled. Other data should not be pickled for other reasons.

What will be pickled can be defined in getstate method. This method must return something that is picklable.

On the opposite side is setstate: it will receive what getstate created and has to initialize the object.

class A(object):
def init(self, important_data):
self.important_data = important_data
Add data which cannot be pickled: self.func = lambda: 7
Add data which should never be pickled, because it expires quickly: self.is_up_to_date = False
def getstate(self):
return [self.important_data] # only this is needed
def setstate(self, state):
self.important_data = state[0]
self.func = lambda: 7 # just some hard-coded unpicklable function
self.is_up_to_date = False # even if it was before pickling

Now, this can be done:

a1 = A('very important')
s = pickle.dumps(a1) # calls a1.getstate()
a2 = pickle.loads(s) # calls a1.setstate(['very important'])
a2
<main.A object at 0x0000000002742470>
a2.important_data 'very important'
a2.func()
7

The implementation here pikles a list with one value: [self.important_data]. That was just an example, getstate could have returned anything that is picklable, as long as setstate knows how to do the opposite. A good alternative is a dictionary of all values: {‘important_data’: self.important_data}.

Constructor is not called! Note that in the previous example instance a2 was created in pickle.loads without ever calling A.init, so A.setstate had to initialize everything that init would have initialized if it were called.

Learn More

Leave a Comment