Jeremy Satterfield
Coding, Making and Tulsa Life

Python Catalog Data Structures

I recently realized I have several open-source projects I'm a maintainer or owner of which I've never really discussed on this site or on social media. Some of these projects have been pretty specific and tied to the other projects I was working on at the time, but several do have the potential for wider general use. I figured now was a good time to start talking about some of these projects and the problems they are intended to solve.

Python Catalogs

This is one of the more general use case projects I maintain. And while the examples and original impetus are for choices in Django, Catalogs could be used for just about any Python project that needs complex mappings. It's a data structure that's very simple to define and very flexible to use.

You can install Catalog yourself via Pypi, pip install pycatalog. As of the latest release it supports Python 2.7 and 3.3+. And you can always submit issues, feature requests and PRs over on Github.

The Choice Definition Problem

One of the niceties that Django brings to the table for both models and forms, introduced in the some of the earliest tutorials, is allowing you to define choices. It's a simple pattern that's not limited to Django, Python or web development, but is weaved through much of Django's out-of-the-box magic. It makes entry easier for users by providing their options to choose from and easier for your code to validate the input.

It starts out easy enough: provide the database value and label for each choice item.

class SomeModel(models.Model):
    status = models.IntegerField(choices=((1, 'Open'), (2, 'Closed')), default=1)

Simple, but right off the bat this example exposes one of the first problems. Anytime you want to actually reference the value, default=1 in this case, you have to reference the value that's going into the database. Hard coding these values can make it difficult to refactor as your project grows and also make code checking for these values crypt and unclear. Take, for example, the simplest of checks if sm.status == 1.

A better, more common pattern is to define these values as constants.

class SomeModel(models.Model):
    OPEN = 1
    CLOSED = 2
    STATUS_CHOICES = (
        (OPEN, 'Open'),
        (CLOSED, 'Closed'),
    )
    status = models.IntegerField(choices=STATUS_CHOICES, default=OPEN)

That's much cleaner and the code references are clearer too, if sm.status == SomeModel.OPEN.

In most cases this is going to be exactly what your project needs and you're done. Good work.

More Complex Cases

But what if data from this model syncs with a legacy remote API which has it's own archaic idea of how these values should be presented? Not too hard, we can add a mapping for that. Well, actually, two mappings because we need to communicate both directions. No problem.

class SomeModel(models.Model):
    OPEN = 1
    CLOSED = 2
    STATUS_CHOICES = (
        (OPEN, 'Open'),
        (CLOSED, 'Closed'),
    )
    API_STATUS_MAP = {
        'op': OPEN,
        'cl': CLOSED,
    }
    REVERSE_API_STATUS_MAP = {v: k for k, v in API_STATUS_MAP.items()}
    status = models.IntegerField(choices=STATUS_CHOICES, default=OPEN)

And sometimes we'll want to go directly from the API value to the user-readable label. Maybe use a property or method for that? That might obscure the fact that the value is based on the status field, sometimes that's good, but maybe not this time.

Also, we want to use a specific template path for a re-used block that changes based on the choices, add another mapping.

Oh, and maybe a different parser class to abstracts the different parsing details based on the choice, add another mapping.

This is where things are starting to go haywire. More mappings, more properties and methods, more mess.

Enter Python Catalog

This is the snowball I was trying to stop when I started working on Catalog a couple years ago. We had a complex project that required a complicated web of mappings and lookup patterns. So I set out to find a data structure that could handle this case with a couple of goals in mind:

  • Simple definition signature
  • Developers could define their own attributes for the choices, to fit their use case
  • Attributes could store values of arbitrary types
  • Simple process for mapping items down to an iterable, containing just the attribute values needed at that point
  • Easily lookup items by any attribute and access any attribute

Starting with the simple signature, Catalog works similar to Python 3's Enum.

class STATUSES(Catalog):
    open = 1
    closed = 2

STATUSES.open.value # =>1

But that doesn't even handle our simplest case of a database value and a label. We can do better. Let's define more attributes.

class SomeModel(models.Model):
    class STATUSES(Catalog):
        _attrs = value, label, api_value
        open = 1, 'Open', 'op'
        closed = 2, 'Closed', 'cl'

Oh, then there's the template paths and parser classes.

class SomeModel(models.Model):
    class STATUSES(Catalog):
        _attrs = value, label, api_value, template, parser
        open = 1, 'Open', 'op', 'some_path/open.html', OpenParser
        closed = 2, 'Closed', 'cl', 'other_path/closed.html', ClosedParser

Now let's add those choices to the field. There's an easy method for that.

    status = models.IntegerField(choices=STATUSES._zip('value', 'label'), default=STATUSES.open.value)

Now I want to get the user-readable label, but I've got the remote api value, or parser based database value.

SomeModel.STATUSES('op', 'api_value').label  # => 'Opened'
SomeModel.STATUSES(1, 'value').parser  # => OpenParser

Conclusion

So there you have it. It's by no means perfect, but after using it in production environments for over a year now, we still find it to be much cleaner and easier to use then some of the other patterns out there.

While the access API is always just as simple, the definitions do get clunkier as you add more attributes to the items. It was never meant to handle dozens of attributes per item and you have bigger problems if that's your use case, but since the attribute values for each item are really just presented as tuples, you can easily throw in parentheses and wrap across lines to help with readability.