Some Notes on Serializing Objects in Python

I was playing with .NET serialization at work the other day and got curious about how Python does it. Serialization is a little confusing in the .NET world, but it's not an insurmountable task to grasp it. For one, there is more than a single implementation of serialization within the .NET base class library, or namely, System.Xml.Serialization and System.Runtime.Serialization, which respectively implement XML and binary serialization. The techniques used in each implementation are also disparate, having the binary serialization make heavy use of class attributes, while the XML implementation uses a method call to XmlSerializer.Serialize.

The Python implementation of serialization is much simpler, concise and easier to understand. It is implemented as a Standard Library module called Pickle. The actions to serialize and deserialize classes are implemented as simple function class and there is no need to put attributes on classes. Let's see how it works.

First import the pickle module and then declare a class called Person as in the code below:


  import pickle

  class Person(object):
    
      def __init__(self, first_name=None, last_name=None, age=None):
          self.first_name = first_name
          self.last_name = last_name
          self.age = age

Now create two instances of the Person class above and place them in a list.


  p1 = Person('Jane', 'Doe', 26)
  p2 = Person('John', 'Hancock', 33)
  people = []
  people.append(p1)
  people.append(p2)

Next serialize the list to a file and then read it back into a new list. First serialize the list:


  fname = 'peoplelist.dat'
  f1 = open(fname, 'wb')
  pickle.dump(people, f1)
  f1.close()

Finally, read the contents of the serialized file back into a new list and print out the name and age of each person:


  f2 = open(fname)
  new_people = pickle.load(f2)
  for person in new_people:
      print '%s %s is %d years old.' % (person.first_name, person.last_name, person.age)

That's it... Serialization in Python is just too easy!

Some thoughts on Python and Unladen Swallow

Between Twitter and the blogosphere, I have been hearing a lot about Unladen Swallow lately. For those who don't know, Unladen Swallow is an experimental branch of Python that aims at improving performance of the language. In their own words, Unladen Swallow is "An optimization branch of CPython, intended to be fully compatible and significantly faster."

I wanted to find out more and started reading their Project Plan page on Google Code. I think their goals are commendable, as you may see for yourself, and number 5 below explains why I added bold face the word branch above:

  1. Produce a version of Python at least 5x faster than CPython.
  2. Python application performance should be stable.
  3. Maintain source-level compatibility with CPython applications.
  4. Maintain source-level compatibility with CPython extension modules.
  5. We do not want to maintain a Python implementation forever; we view our work as a branch, not a fork.

This is all fine and dandy, and the list above has made the rounds on the blogs. But what does it all mean? What follows is my impressions of the most important points that the Unladen Swallow branch is addressing.

A New Virtual Machine

The goal is to eventually replace the Python 2.6.1 virtual machine with a just-in-time compiler built for the LLVM. The rest of the Python runtime would be left untouched. The key benefits of this approach are that is a register-based machine and those perform better than stack machines, which is what the current Python VM is implemented as.

The internals of the implementation will assume at the outset that the machine has multiple cores. For instance, very aggressive optimization of code is assigned to a secondary cores while compilation occurs on other cores. The garbage collector for Unladen Swallow will also be implemented to utilize multiple cores.

The Global Interpreter Lock

While Python has had threading for a while, it is not a true multi-threading implementation. This is because of the existence of the GIL. Dave Beazley has written about the GIL and how it works several times and you should read his "The Python GIL Visualized" article to find out more about why the GIL keeps Python from having a real multi-threaded runtime.

I bring up the GIL here because the folks working on Unladen Swallow plan on removing the GIL from Python, although they are not very optimistic about it. And even if they are not able to remove the GIL completely there may be other optimizations in the garbage collector reference counting mechanism that may yield some improvements in the threading area.

Anyway, these are the two major points I take away from the Unladen Swallow plan of record. These changes seem pretty big to me and a major risk of doing this kind of work is that your changes are rejected by the community. However, the Unladen Swallow team is sponsored by Google who also employs Guido, so I'm sure that those guys are talking amongst themselves.

Thanks for reading this, go read the project plan and let me know what you think.