Monday, January 19, 2009

A small metaclass for strongly typed symbols in Python

Numerous tutorials on Python metaclasses contain this famous quote by Python master Tim Peters:
“Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don't (the people who actually need them know with certainty that they need them, and don't need an explanation about why).”

I suppose the quantified part of the statement has long invalidated itself, simply by making a significant portion of already inquisitive Python users aware of the elitist feature... Once you've learned about the existence of this powerful magic weapon it is hard not to stay on the lookout for the 'need' to apply it. :-)

My use case arrived when I was looking for a way to express sentences of first order predicate logic in Python. Just for the fun of the puzzle I set out to build my own logic inference engine based on some text book algorithms. I've certainly enjoyed exploiting Python's expressiveness and got the engine working to some extent, but not far enough to be completely comfortable publishing all of its code... So, in case you're looking for working code I suggest you take a look at some pre-existing projects like AIMA-Python, PyLog and Pyke.

As predicate logic expressions clearly don't share much with a dynamic OO language in the way of syntax and semantics, it makes most sense to define a separate grammar for these expressions and then let Python parse them into an internal representation. Apart from importing text files full of knowledge at once, this should allow you to feed and query a knowledge base as follows:
>>> kb = KnowledgeBase()
>>> kb.tell("Human(x) => Mortal(x)")
>>> kb.tell("Human(Socrates)")
>>> kb.ask("Mortal(Socrates)")
True

With a number of excellent Python parser tools readily available it should be straightforward enough to go this 'external domain specific language' route. Even so, I didn't want to start off being side tracked by solving a secondary puzzle that would involve studying parser tools, selecting one (or none?) and then learning its specifics. I totally look forward to doing that in the future :-), but decided to examine whether I could kick-start things by facilitating an 'internal domain specific language' first. (Actually, AIMA-Python's logic.py reveals that it doesn't need to take much code at all to parse an 'external DSL' for this purpose, but when I started I didn't want to spoil the fun by studying existing solutions).

Ideally, the code to build up the internal representation of a predicate logic sentence should closely resemble the conventional way of expressing such a sentence. Python is not considered as well suited for internal DSLs as for instance Ruby. (That language doesn't require braces in method calls, allows for more flexible whitespace, has 'symbols' and supports blocks). Python fortunately supports operator overloading and that helps (as is demonstrated by logic.py and the 'Pythologic' Python cookbook recipe that I'll come back to below).

An interesting hurdle to being able to express the one paradigm inside the other comes from a fundamental difference regarding the use and meaning of 'names'. A predicate logic name (either a variable, a constant, a predicate or a function) doesn't need a meaning other than its respective type and the context in which it is used. Python raises a NameError when a name (variable, class or function/method) is referenced before it was assigned a value. (Of course it's easy to side step this difference by just sticking to Python's model and require all names used in logic expressions to be assigned values first).

The 'Pythologic' Python cookbook recipe however provides a solution that is both very clever and rather hard to understand (I'd say). It involves a @logic decorator that 'defines' undefined names within the function to which it is applied on the fly, by inspecting that function's func_code and func_global attributes and following a naming convention to distinguish between constants and variables. (The idea is incorporated in a symbols module by Leonardo Maffi).

I could have embraced that trick, but felt there should be a more explicit (i.e. more pythonic??) way to introduce new predicate logic names within an expression and not rely on a convention to distinguish between different types of 'symbols'. The natural OO representation for a predicate named 'Human' would of course be an object of type Predicate with an attribute called 'name' that contains the value 'Human'. In Python such an object is normally 'introduced' by invoking Predicate('Human'). It is clear however that this construct wouldn't help making our DSL expressions any more readable, if only because the number of required braces and quotes gets out of hand quickly:
>>> kb.tell(Predicate('Human')(Var('x')) >> Predicate('Mortal')(Var('x')))

Also, having to re-instantiate the same symbol every time it is used doesn't come across as very efficient. We should not optimize prematurely, but it is likely that the logic inference algorithm's performance would benefit greatly if symbol (in)equality could be established 'instantaneously' through the is operator (by verifying whether two symbol references imply the same memory address) as is 'incidentally' the case with Ruby's symbols. It would therefore be great if a construct could be engineered that not only takes away the need to instantiate the same symbol (of a given type) more than once, but that also makes it (virtually) impossible to do so.

Realizing that Python's __getattr__ hook allows you to control what happens when an attribute is dereferenced on a an object for which that attribute is not yet defined, I started to wonder whether it would be possible to make the syntax look like:
>>> kb.tell(Predicate.Human(Var.X) >> Predicate.Mortal(Var.X))

Basically, this would turn these Predicate and Var classes into infinitely large namespaces that contain any instance of those classes anybody can think of naming. And I believe there's definitely a pythonic feel to that... (which doesn't say much, as anybody can say that about anything).

Because it makes sense for Predicate and Var (along with Const and Function) to inherit from a Symbol base class, I will express the sketched design goals as a unittest TestCase for that Symbol class:
class SymbolTest(unittest.TestCase):
def testSymbols(self):
class TestSymbolSubType(Symbol):
pass

self.assertTrue(isinstance(TestSymbolSubType.A, TestSymbolSubType))
self.assertTrue(TestSymbolSubType.B is TestSymbolSubType.B)
self.assertTrue(TestSymbolSubType.C != TestSymbolSubType.D)

# unconventional, though desirable.
self.assertTrue(TestSymbolSubType('E') is TestSymbolSubType('E'))
self.assertTrue(TestSymbolSubType.F is TestSymbolSubType('F'))

# symbols should not be inherited or otherwise 'shared' between namespaces
self.assertTrue(Symbol.G != TestSymbolSubType.G)

# almost by definition, but can't assume anything when expecting
# such unconventional behaviour...
self.assertTrue(isinstance(TestSymbolSubType('H'), TestSymbolSubType))

# should return a proper representation for itself
self.assertEquals(repr(TestSymbolSubType.I), "TestSymbolSubType.I")

# should not be possible to corrupt symbol's name attribute
def attemptNameCorruption():
Symbol.J.name = 'K'
self.assertRaises(AttributeError, attemptNameCorruption)
def attemptNameDeletion():
del Symbol.L.name
self.assertRaises(AttributeError, attemptNameDeletion)

As is often the case I'm afraid, I first developed a solution and then went looking for some 'prior art' (somebody else having solved the same problem already)... In this case I didn't find anything that would satisfy all of the above test assertions, but you might still want to have a look at the SymbolType package by esteemed Python contributor Phillip J. Eby that he mentions in a blog post on Symbols and DSLs in Python. It appears to me that the symbol concept has gone somewhat mainstream, but the idea of subclassing a Symbol type certainly hasn't... :-) Still, that's exactly what I'm looking for because for instance a Predicate object's behaviour and semantics differ fundamentally from a Const object's.

So, here's my Python 3.0 (or: Python 3000, Py3k) code that satisfies the test assertions above (syntax differences with Python 2.5 pointed out in comments):
class SymbolMetaclass(type):
""" Metaclass that facilitates typed Symbol classes with a literal pool """
def __init__(cls, *args):
cls.__super = super() # For Python 2.5 this must be: super(SymbolMetaclass, cls)
cls.__super.__init__(cls, *args)
cls.__sympool = {}
def __call__(cls, name):
if not name in cls.__sympool:
cls.__sympool[name] = cls.__super.__call__(name)
return cls.__sympool[name]
__getattr__ = __call__

class Symbol(metaclass=SymbolMetaclass):
# For Python 2.5 the above 'metaclass' parameter must be removed
""" Class designed for inheritance that facilitates Symbol literals
of a specific type """
# For Python 2.5 include:
# __metaclass__ = SymbolMetaclass
def __init__(self, name):
self.__name = name
def __repr__(self):
return self.__class__.__name__ + '.' + self.__name
name = property(lambda self : self.__name) # forcing immutability of 'name' attribute

It took me some time to figure out that I had to implement __call__ instead of __new__ on the metaclass, but in retrospect this is obvious. Largely due to the seeming complexity of using super in Python 2.5 it was also a bit of a challenge to find out how I was going to actually construct a new instance, now that __call__ was overriding the standard behaviour. But the resulting code shows clearly just how easy it really is to get what you want in this expressive language.

With a number of other interesting classes added, I have indeed succeeded in making the following work:
>>> kb = KnowledgeBase()
>>> kb.tell(Predicate.Human(Var.x) >> Predicate.Mortal(Var.x))
>>> kb.tell(Predicate.Human(Const.Socrates))
>>> kb.ask(Predicate.Mortal(Const.Socrates))
TRUE

This might still look slightly verbose and even somewhat repetitive, but that should be considered a small pain compared to the enormous gain in explicitness ;-). You could of course cut on the verbosity by first 'importing' a name into the local or global namespace: Human = Predicate.Human, but once you start going down that path, the use case for my metaclass quickly loses all legitimacy...