You are here: Home

Two Voices

by Dan Fairs last modified Sep 30, 2010 02:53 PM

Deferred foreign keys with django-dfk

by Dan Fairs last modified Dec 30, 2011 07:41 PM
django-dfk allows deferred foreign keys to be declared on models, so that a concrete target can be set later.

django-dfk is a project that I developed for a recent project to allow foreign keys to be declared on models without an explicit target ('dfk' stands for 'deferred foreign key'). It provides an API to 'point' these foreign keys to a concrete target at a later date, and also allows you to forcefully 'repoint' foreign keys that have already been set up. This last facility should be used with caution - it's essentially akin to monkey-patching.

You can use GenericForeignKeys for this, and these are slightly more flexible in that each model instance foreign key may point to a different model. However, there is a performance cost associated with them, and joining can be problematic.

(The project actually rarely uses django-dfk directly - instead, it uses it as a basis for abstract foreign keys, which have a greater awareness of the application environment - however, that's a topic for another day.)

Before we go much further though - rather than using this package, a better long-term investment would be to look at the app-loading branch that Arthur Koziel and Jannis Leidel have been working on - testing it, and helping them to get it into a state to be merged into trunk. I think that should provide a more holistic approach to solving this kind of problem. Until then...

Deferred foreign keys are useful in applications where you know that a model will require a foreign key, but don't know what the target will be at the time you're writing the application. Taking an example from the project that django-dfk was created for, let's say you have a Django app that is an online game. Users are entered into the game by way of an 'Entry' model, and the Entry contains a foreign key back to a model which contains user information.

However, you will have several instances of this game deployed, and each game may have its own source of player data - hence, its own Player model.

Let's say that there are two applications involved: one called 'core', which contains the core game logic, and one called 'mygame1', which contains models and logic specific to an individual game deployment.

core/models.py:
from django.db import models

class Entry(models.Model):
created = models.DateTimeField(defaults=datetime.datetime.now)
player = models.ForeignKey( …. uh, what do I type here?

Remember - the core application is going to be deployed in several places, and there are several possible places that FK might need to point.

One approach solving this would be to introduce a model which relates an entry to the game-specific Player model, resulting in models that look something like this:

core/models.py:
from django.db import models

class Entry(models.Model):
    created = models.DateTimeField(defaults=datetime.datetime.now)


mygame1/models.py:

from django.db import models

class MyPlayer(models.Model):
    name = models.CharField(max_length=50)
    entry = models.OneToOneField(Entry)

This works fine - however, there are some games which share player data with each other. This means that the key needs to live on the Entry model - but, we don't know which player model to point the FK at, as this model might be deployed in multiple places.

We can solve this using django-dfk.

core/models.py:
from django.db import models
from dfk.models import DeferredForeignKey

class Entry(models.Model):
    created = models.DateTimeField(defaults=datetime.datetime.now)
    player = DeferredForeignKey(unique=True)


mygame1/models.py:

from django.db import models
from dfk import point
from core import Entry

class MyPlayer(models.Model):
name = models.CharField(max_length=50)

point(Entry, 'player', MyPlayer)

The first thing to notice is that our Entry model now sports a DeferredForeignKey instance. It's important to realise that this isn't a real field, it's just a placeholder. Any arguments (except for the special 'name' argument, more on this below) are simply stored.

The action happens during the call to 'point', in mygame1's models.py. As the name implies, this points the DFK called 'player' on Entry to the MyPlayer class. (Actually, under the hood, it simply replaces the DeferredForeignKey instance with a real ForeignKey instance complete with the arguments which were originally passed to the DFK). Note that we do this at the module level in models.py - all your pointing (and repointing) needs to be done before the application is ready to use to ensure that syncdb outputs the correct SQL.

After all this is done, the definition of Entry will effectively look like this (although of course your code won't have changed!):

class Entry(models.Model):
created = models.DateTimeField(defaults=datetime.datetime.now)
player = models.ForeignKey('mygame1.MyPlayer', unique=True)

Other game applications (say, mygame2 and mygame3) would point the foreign key to the Player model that is appropriate for their game.

It's quite common to need to point a number of these keys at once - there might be several models which refer to a player. Rather than writing lots of 'point' statements (and having to add to them if a new key is added), django-dfk allows deferred foreign keys to be named:

core/models.py
class Entry(models.Model):
created = models.DateTimeField(defaults=datetime.datetime.now)
player = DeferredForeignKey(name='Player', unique=True)

class StatusUpdate(models.Model):
text = models.CharField(max_length=140)
player = DeferredForeignKey(name='Player')


mygame1/models.py:

from django.db import models
from dfk import point
from core import Entry

class MyPlayer(models.Model):
name = models.CharField(max_length=50)

point_named('core', 'Player', MyPlayer)

Roughly translated, this means 'point all the deferred foreign keys in all models in the core app which have the name 'Player' to the MyPlayer model'. This will affect both the 'Entry' and 'StatusUpdate' models above.

Finally, django-dfk also provides a 'repoint' function. This is a big hammer, and is not to be used lightly. 'point' only works on DeferredForeignKey instances, by design - it's meant to prevent you making mistakes. 'repoint' works on regular foreign keys too. It's useful if you've used 'point' to change the destination of a DFK, but later need to change it. (However, if you find yourself in this position, you should probably refactor your code to just use 'point'). Both 'point' and 'repoint' take care of cleaning up various internal Django caches to ensure things life filtering on related fields work properly after a point operation.

django-dfk can be found on PyPI, and the source is on github - forks, bug reports and patches with docs and tests are welcome. django-dfk is in production on several high-volume sites.

Mobile API Design - Thinking Beyond REST

by Dan Fairs last modified Jun 19, 2011 11:13 AM
This article explores the problems of optimising REST APIs for mobile device performance, and suggests a way of allowing clients to request alternate representations.

Nate Aune and Anna Callahan gave a great talk at this year's EuroDjangoCon about a service that they'd built in 24 hours, valentun.es. Along with a great story, the meat of the talk was about the concessions you have to make with a mobile API with respect to data transfer rates and connectivity. Some of the things they said struck a chord with my own experiences of designing mobile APIs, and inspired me to write this post about those experiences, principles, problems, solutions - and an idea for the future.

What's a REST API?

There are probably as many definitions of what a REST API is as there are implementations (so naturally I'll add my own), but they all share some common characteristics:

  • They recognise that the web is composed of resources, and are structured around that
  • They use the underlying HTTP methods (GET, POST, PUT, DELETE) to interact with resources
  • Representations of those resources are what actually flow back and forth between REST API servers and clients
  • URIs (usually URLs, which I'll use for the remainder of the article) are used to identify application state - particularly information on what the client can do next - andis contained within the representations received from the server.

Take a look at the Wikipedia article on REST for one description of what it means to be RESTful.

Resources

When you design a REST API, the first task is usually to consider what resources you want to expose to web clients. This is often pretty straightforward: a reasonable approach is often to use the same arrangement as your underlying data model (be that tables in a relational database, document types in a document management system, and so on). This doesn't have to be the case, of course - in particular, you'd be unlikely to expose a link table that facilitates a many-to-many relation in a relational database as an individual resource, for example.

Let's say we're building an API for a pizza shop. We're probably going to have a Pizza resource, a Topping resource, and an Order resource.

Resources tend to be arranged hierarchically, probably due to the nature of the URLs used to address them. It's therefore common to have resources in an API that work as collections of other resources. The URL for a topping resource might be `http://pizza.com/toppings/cheese`. The URL of the collection resource for all toppings would be `http://pizza.com/toppings/`. (The presence of absence of a trailing slash is meaningless, but for the purpose of this article I'll use the convention of collection resources having a trailing slash.)

Representations

Representations are how the client and server talk about a resource. Prior to the advent of machine-usable APIs, HTML was the most common form of representation. These days, XML and JSON representations are commonly used for machine-readable representations. PDF, JPEG, and MP3 are also all perfectly good representations of a resource as is, of course, good old HTML.

Keep in mind that a resource may have more than one representation. You might be able to fetch a resource as HTML (useful for humans) and JSON (useful for machines). It's also important to realise that representations are not simply content types: there's no reason a resource can't have multiple JSON representations, for example. This idea becomes important a little later on.

Once you've defined your resources, therefore, thinking of representations is usually the next step. The default choice of a simple JSON representation, with a key per resource attribute, is a common choice. That will do for now. A representation of our Topping resource might therefore look like this:

{
"calories": 100,
"name": "Cheese"
}

Resources may need to refer to other resources. This should be done in a self-describing way: the client should not need to have any knowledge of the server application to build its own URLs. Seeing keys like 'foo_id' in a JSON representation is usually a sign of this design error. For example, this is a reasonable representation:

{
"favourite_topping": "/toppings/cheese"
}

This isn't so good:

{
"favourite_topping_id": "36"
}

That 'favourite_topping_id' is meaningless to the client - it has to know how to construct topping resource URLs to be able to use that data.

Incidentally, note that the examples in this article only include URL paths, rather than full, absolute URLs. Either is fine; as long as the client can resolve them.

Interactions

The next piece of the usual REST API design process is to consider the operations available for each resource, and what they mean. There are four key operations provided by HTTP - GET, POST, PUT and DELETE. (Actually, HTTP does provide more, but this quartet is what is most normally used in RESTful API design.)

These are often compared to the core SQL-based relational database operations of SELECT, INSERT, UPDATE and DELETE. This is slightly misleading. SELECT and GET are fairly similar, as are the two DELETEs; POST and PUT are different beasts though. POST is used for a write operation on a resource that has side-effects. PUT writes to a resource, but has no side-effects.

Put another way, PUT is idempotent - if you do the same PUT twice (and there's no other state changes in between) then the system state will be the same. POST carries no such guarantee.

A good example might be to compare the creation of a new Topping resource with the creation of an Order for a pizza. Creating a Topping would probably consist of something like the following:

PUT http://pizza.com/toppings/jalapenos
{
"calories": 25,
"name": "Jalapeños"
}

This would create the Topping resource at http://pizza.com/toppings/jalapenos. Re-running the request would not make any difference. Changing the 'calories' field in the JSON to a new value would replace the existing resource with the new one. So - PUT has a 'create-or-update' semantic.

The response for this request would probably simply be an HTTP 200 response, with an empty body (or more strictly, a 204, which tells the client to maintain its view of the representation.)

Contrast this with what creating an order might look like:

POST http://pizza.com/orders/
{
"toppings": [
"/toppings/cheese",
"/toppings/jalapenos"
],
"card": "1234567890"
}

We've made this a POST request because it has side-effects: it bills your credit card, and sets off the process of making a delicious pizza. Making that same request twice would bill your card twice, and get you two pizzas. POST is not idempotent.

The HTTP response here would probably be 201 Created, with the representation of some Confirmation resource in the body, perhaps looking like this:

201 Created
{
"order": "/orders/432544"
}

Note once more how the response contains a self-describing URI, rather than some opaque order ID, which is not meaningful out of context.

Reality Bites

So we've identified our resources, the representations of them, and what operations the HTTP verbs actually correspond to. We're good to go, right?

Well, yes, basically. It'll work. But you'll almost certainly run into some problems. Before we get to the meat of selecting resource representation though, let's take a minute to consider a couple of real-world implementation problems you're likely to encounter.

Aside 1: Bad HTTP clients

There are some broken HTTP clients out there. I've run into one: Flash (circa 2009). Flash gets upset if it doesn't receive a 200 response from the server. In particular, if your server returns a 4xx HTTP code in an API response, Flash will not even pass the response to the Flash application.

On the Django project on which we ran into this, we ended up writing a custom middleware that looked for the presence of a magic query string parameter on the URL and, if it was found, replaced the status code with a 200 and put the real status code on the first line of the body. The Flash app then parsed out the response code from the response body. Ugly, but workable.

Flash also (at the time, it may have changed) was unable to perform PUT or DELETE requests. Our solution was similar: the Flash application would always perform a POST when it actually wanted to do a PUT or DELETE, and the real intended method went into another magic query string parameter. The aforementioned middleware would then rewrite the HTTP method on incoming API requests that carried this flag.

Aside 2: Rich Error Handling

The standard approach to expressing errors in a REST API is to use HTTP status codes. As is often the case with REST, this works fine for simple systems, but is simply too limiting for more sophisticated systems, particularly those which might submit a JSON or XML document to describe a POST request. If a client does submit such a rich request, which perhaps does not validate on the server, it is useful to be able to provide more than just a 400 Bad Request error.

Since I primarily use Django these days, and Django uses forms and formsets for validation, I have found that providing a standardised JSON representation of form and formset errors works well. The format can be specified in advance, and allows the server to inform the client of rich validation errors (down to field-level validation, with decent error messages) even over an API. I hope to write more on this in a future post.

Normal service resumes...

Normalised resource representations

OK, let's say we've got an iPhone app which uses our REST API to place orders and display the calorific content of the toppings we added. Let's look at the response that it might receive from a GET request to our Order resource:

GET /orders/432544

200 OK
{
"toppings": [
"/toppings/cheese",
"/toppings/jalapenos"
]
}

That's cool - the client can see that there are a couple of toppings there, so it fetches each one to get the calorific content:

GET /toppings/cheese
200 OK
{
"calories": 100,
"name": "Cheese"
}

And then:

GET /toppings/jalapenos

{
"calories": 25,
"name": "Jalapeños"
}

That's great. Our iPhone app can tell our user that the pizza they ordered has 125 calories.

What's not so great is that our iPhone app has had to make three separate requests. This works fine in development, across the office wifi network to the dev server. It doesn't work so well when a user's ordering a pizza on the train home from work, and the train goes into a tunnel halfway through this multi-request conversation (and the user was outside a 3G signal anyway).

Nesting resource representations

The natural response (and probably the right response) is simply to extend the representation of an Order resource to include the required representations of our toppings. This means that our Order resource representation now looks like this:

GET /orders/432544

200 OK
{
"toppings": [
{
"calories": 100,
"name": "Cheese"
},
{
"calories": 25,
"name": "Jalapeños"
}
]
}

That's cool. Our iPhone app now only needs to make one request, and it gets all the information on the toppings as well. We've traded the brevity of the original representation of the Order resource for not having to make multiple requests. The individual Topping representations are still available, of course.

To (ab)use database parlance, we've denormalised our Order representation, trading size for performance (that is, a smaller representation will be quicker to download).

Now, wind the clock forward a few months. We've extended our iPhone app and the supporting REST API to cover table reservations, meal pre-ordering, and so on. We've run into the problem described above, so we've heavily optimised our API responses to minimise HTTP round trips. Life is good. Right?

We're approached by a company who have developed an Android app that can use our API, wondering if we'd be happy to make it the official Android app. It's an awesome piece of software. They've thought about the user experience in a totally different way, and it works well on Android (though the current iPhone app approach works best on iPhone). The only real problem is that some of the API requests they make download a ton of data that they simply don't need; and they have to make lots of other smaller requests to make other parts of their app work as they want.

In other words, they want a different set of optimisation choices for the API.

What do we do? Add a new API version with different optimisations (even though it's still dealing with the same set of resources) in the representations? That doesn't sound so great, as we'd be maintaining two APIs. It doesn't scale, either - what happens when someone writes an app for the TV, which needs another set of tradeoffs?

A Possible Solution: Choosing Representations

(Note that this section outlines a possible solution - I don't have an implementation for this yet.)

The key insight is that the applications do not require different resources. They merely need different representations of those resources. Some frameworks allow this to be done by specifying an extension on the end of the URL. If we did that, we'd end up with:

http://pizza.com/toppings/cheese.xml
http://pizza.com/toppings/cheese.json

While not strictly wrong, this isn't ideal when thinking in resources. Different URLs mean different resources. Just because both XML and JSON representations are available doesn't double the number of resources - it's all the same cheese topping. We need some way of expressing what representation we want for a given resource. Ideally, we shouldn't change the URL.

Fortunately, HTTP gives us the tools to do this: the HTTP Accept header. It's usually used for specifying content types: text/html, application/json, and so on. However, the specification allows for extra parameters. Let's take a look (from http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html):

Accept         = "Accept" ":"
#( media-range [ accept-params ] )

media-range    = ( "*/*"
| ( type "/" "*" )
| ( type "/" subtype )
) *( ";" parameter )
accept-params  = ";" "q" "=" qvalue *( accept-extension )
accept-extension = ";" token [ "=" ( token | quoted-string ) ]

That means that a client could send a request like this:

GET /orders/432544
Accept: application/json;q=1;depth=1,application/json

Here, we've added a depth parameter to the acceptable types. This depth parameter works a little like the numeric depth parameter to a Django queryset's `select_related` method call - it asks the server to traverse one level deep into related resources. The standard also specifies that more specific content types take precedence over less specific types - which means that we have a mechanism for the client to allow fallback to a normal JSON representation of the resource. If the server cannot provide a representation that fulfils any of the values in the Accept header, it would return a 406 Not Acceptable.

Aside: q?

What's that 'q' parameter? It's what appears to be a historical quirk of the spec, where q is used to specify a 'quality' - used to allow the choice of a variety of representations based on degrading quality, eg. sample rate for sound. It's baked into the accept-params definition, so we include it here.

Example Requests

Let's compare two requests, one before using the proposed Accept convention, and one after:

GET /orders/432544
Accept: application/json

200 OK
{
"toppings": [
"/toppings/cheese",
"/toppings/jalapenos"
]
}

This is exactly the same request as we saw before, but with an explicit Accept header, specifying that the client is prepared to accept the normal form of the response. Let's see what might happen if a depth is specified:

GET /orders/432544
Accept: application/json,application/json;q=1;depth=1

200 OK
{
"toppings": [
{
"calories": 100,
"name": "Cheese"
},
{
"calories": 25,
"name": "Jalapeños"
}
]
}

We can now see the client expressing that it can accept resource representations nested to a single level - and the server responds in kind. Note that the client also specified that it could accept the normal form - so it would be valid for the server to respond with the first, normal form even to this second request.

What if the server could not fulfil the request? Perhaps the backend is unable to perform the join required to provide the nested data, and the client specified that it could only accept that nested representation:

GET /orders/432544
Accept: application/json;q=1;depth=1

406 Not Acceptable

It might be sensible for the server to provide the normal representation in the body anyway, in case the client was able to process it.

In particular, this response might be used to prevent (or mitigate) DoS attacks; depending on the application, the server might impose a depth limit of 2 or 3 levels.

Thoughts: Advantages, Limitations and Questions

The approach outlined above would afford clients some flexibility in 'denormalising' the data that they receive on request, avoiding both the need for developers to create custom code to nest related data and the tendency for APIs to become over-specialised to the needs of one particular API client over time.

It's also cacheable - the Accept header should form part of the cache key, and provides a graceful degradation for servers which cannot perform the data nesting request.

However,it's not a silver bullet: in particular, it's not clear how one might implement an analogue to Django's .select_related('foo', 'bar') form, where the API consumer could specify which resources it wished to be nested in the main resource requested. Instead, all the client can specify is a simple depth, and may therefore receive far more nested information than it might actually need.

As far as I can see, there are two problems to solve before being able to implement this more specific second form:

  • How to provide a self-describing way for a server to indicate that this facility is available for linked resources
  • How a client can tell the server which individual resource links to expand
    • Perhaps including an XPath-like parameter in the Accept header might work here (tweaked suitably to apply to JSON documents too); but how would the client know which paths were expandable without requesting the normal form to start with?

 

Implementation details aside, I do think that setting a convention (even of a limited form, such as the depth approach described above) would increase the usefulness of APIs. Perhaps I'll even get around to a Tastypie/Piston extension to implement it!

Comments on the approach, improvements and especially criticisms, are welcome in the comments.

Introducing django-lazysignup

by Dan Fairs last modified Apr 24, 2011 05:00 PM
django-lazysignup is a package designed to allow users to interact with a site as if they were authenticated users, but without signing up. At any time, they can convert their temporary user account to a real user account. Read more about it below.

django-lazysignup is a Django application that was partly inspired by a talk that Simon Willison gave at EuroPython a few years back (perhaps 2008, or 2009?) and partly to scratch an itch I had with an application I was building at the time. The problem it tries to solve is that making users sign up with a web site just to try out your app is quite a high barrier - potential users just bounce right off that registration form.

I'd seen some efforts to solve this problem before. Most seemed to involve stashing the data for some predetermined part of the website somewhere (often in the session) and then reconstituting it into real application data when the user eventually bites the bullet and signed up. This worked OK, but you had to write it anew for every web site, as clearly the data you'd want to store would change from site to site. You also ended up effectively developing a miniature version of your site that would work with some limited data set.

This didn't really seem good enough.

So I started wondering - what if we just created a real user for every person who visited the site? Django already has support for creating users with unusable passwords - so if we just create a user with an unusable password every time a new person comes along, log them in, and then at some future point (presumably once they've fallen in love with your site) they can set themselves up with a real username and password. And as a bonus, all that data that they created while messing about with the site sticks around, and carries over into their 'real' user.

This is, in essence, how django-lazysignup works.

Let's take a look in a bit more detail. You can grab an official release from PyPI, or clone it on GitHub.

What's in the package?

Once you've installed django-lazysignup (I'll let you read the docs to see how to do that), you've got a few tools to play with:

  • An authentication backend
  • A user conversion view
  • The allow_lazy_user view decorator
  • The is_lazy_user template filter
  • The user agent blacklist
  • Custom user models

 

Authentication backend

The authentication backend needs to be installed for django-lazysignup to work. This backend is required so that we can authenticate the temporary user accounts without a password. We refer to these temporary users as 'lazy' users - they haven't bothered to sign up yet.

The user conversion view

In many cases, you're going to want your users to sign up eventually. The package includes a view to allow you to do this, converting a lazy user to a real user. In practice, this simply involves the user setting a username and password for their temporary user account. This approach means they get to keep all the data that they've already created in your application.

The allow_lazy_user decorator

The temporary user creation process is potentially an expensive one (and on a high-traffic site, may cause contention on the user table). django-lazysignup therefore provides a decorator that allows the developer to specify which views can trigger this process. Sites I've developed that use this package tend to apply the decorator to the views that do interesting things, but exclude the homepage and any static pages on the site.

The is_lazy_user template filter

It's often useful - particularly in templates - to find out whether the current user is a lazy (ie. temporary) user or not. In particular, you may want to show a link to the convert view if they're a lazy user. This template filter is provided for this purpose. Note that lazy users will appear as authenticated (ie. is_authenticated() returns True). For now, has_usable_password also returns False for lazy users, though this should not be relied on. The canonical way of detecting a lazy user is through the is_lazy_user filter (and its associated function in the utils module)

The user agent blacklist

You probably don't want every request to a view that's opted in to lazy user creation to actually create a user. Principally, you probably don't want search engines to do this. The user agent blacklist is a crude way of filtering out such robots.

Note that this means that views that have the allow_lazy_user decorator won't be guaranteed to always have an authenticated user. You still need to make sure that your views work with unauthenticated user (or mark them with the login_required decorator or similar).

Custom user models

As of version 0.7, django-lazysignup also has limited support for custom user models. Just set the LAZYSIGNUP_USER_MODEL setting appropriately (by default, it's auth.User to support contrib.auth out of the box). As alluded to before, support for custom users is basically predicated on the user model looking pretty much like a normal Django user - especially in that it's stored in the database.

I anticipate that this mechanism will change in the distant future, if Django were to adopt some form of pluggable model (or at least, a pluggable user model).

Restrictions

django-lazysignup was built with Django's own contrib.auth application in mind. If you're not using this for authentication, then integrating django-lazysignup will be more of a challenge. (If you do, and there are some changes to the package that would help you, please let me know!). In particular, it expects that you will have a user model stored somewhere in the database.

The Future

Honestly, I'd expected to have a 1.0 release out the door by now, but people keep suggesting great features that I'd like to get in before committing to API stability. Currently on the list are:

  • Global view opt in, with opt-out decorator (suggested by Rob Hudson)
  • Support for deferred user validation - for example, to support email address validation (suggested by Alex Ehlke)


In addition to Rob and Alex, thanks also to Luke Zapart for suggesting and providing an initial implementation and tests for the custom user model feature.

If you have a play with the app, and there's something you'd like to see, then do get in touch - or just fork it on GitHub, add the feature (with docs and tests, preferably!) and send me a pull request.

And that application that django-lazysignup was originally built for? Well, I still want to do it. Someday.

Speeding up Django unit test runs with MySQL

by Dan Fairs last modified Dec 07, 2010 02:22 PM
Here are a couple of tips to speed up unit test runs on Mac OS X and Linux when running MySQL.

When I'm developing Django sites, my database of choice is usually PostgreSQL. However, lots of clients use MySQL. And there lies a problem: table creation on MySQL seems to be an order of magnitude slower on Mac OS X than on Linux. This makes repeated unit test runs extremely painful.

I researched this a little bit a while ago, and noticed that it had been reported as a bug in the MySQL tracker. At the time, there were no fixes or workarounds.

A recent update, however, has revealed the use of the skip-sync-frm option. Put it in your MySQL config file in the [mysqld] section for a quick speedup:

[mysqld]
default-table-type=innodb
transaction-isolation=READ-COMMITTED
default-character-set=utf8
skip-sync-frm=OFF

Of course, nothing in this life is free, as Daniel Fischer explains in a comment:

The reason why it's slower on Mac OS X than on Linux is that on Mac OS X, fcntl(F_FULLFSYNC) is available, and mysqld prefers this call to fsync(). The difference is that fsync() only flushes data to the disk - both on Linux and Mac OS X -, while fcntl(F_FULLFSYNC) also asks the disk to flush its own buffers and blocks until the data is physically written to the disk.

In a nutshell, it's slower because it's safer.

So, we're trading data integrity for performance - but this is a development machine, so trashing and recreating databases (or the MySQL installation for that matter) is fine, if necessary.

Et tu, Linux?

My colleague was having similar problems on the latest Ubuntu, 10.10. The tweak above helped him too, but his test runs were also painfully slow. He'd already added the 'noatime' option to fstab.

It turns out that the newest Ubuntu ships with ext4 as the default file system. By default, ext4 makes absolutely sure that all data has been written out to the filesystem journal before writing the journal commit record. This is done through the use of filesystem barriers. Again - this is done to prefer data integrity over performance. Since this is a dev machine, it's disposable, and performance is more important. So, we can turn this off in /etc/fstab:

/dev/sda3 on / type ext4 (noatime,rw,errors=remount-ro,barrier=0)

Read more about the barrier setting on Kernelnewbies.org.

Just to reiterate - it's probably best not to do this on a machine that's important without thinking about it carefully. Those settings have conservative defaults for a reason!

Filtering Dropdown Lists in the Django Admin

by Dan Fairs last modified Oct 04, 2010 08:19 AM
It's not immediately obvious how to filter dropdown lists in the Django admin interface. This article will talk about how ForeignKeys can be filtered in Django ModelForms and then the Django admin.

Automatically-generated dropdown lists can seem a little mysterious at first - particularly when you first want to customise what they contain in the Django admin. I'm going to go through a number of examples of increasing complexity of customising the content of dropdowns in various contexts: ModelForms, and then into the Django admin.

Here are the models that the examples will work with. They're abbreviated and slightly modified versions of some models from the Swoop project I'm currently working on:

class Area(models.Model):
title = models.CharField(max_length=100)
area = models.MultiPolygonField(blank=True, null=True)

class Trip(models.Model):
title = models.CharField(max_length=100)
area = models.ForeignKey(Area)

class Landmark(models.Model):
title = models.CharField(max_length=100)
point = models.PointField()

class MountaineeringInfo(models.Model):
trip = models.ForeignKey(Trip)
area = models.ForeignKey(Area, blank=True, null=True)
base_camp = models.ForeignKey(Landmark, blank=True, null=True)

As you can see, we're using GeoDjango here - I'm not going to talk much about that here, but it should be obvious what's going on when we get it it. Note that these examples assume Django 1.2.

Here are the cases that this article will cover:
  • Filtering a forms.ModelForm's ModelChoiceField
  • Filtering a Django admin dropdown
  • Filtering a Django admin dropdown in an inline, based on a value on the main instance (phew!)

Filtering a form's ModelChoiceField

Consider this form:

class MountaineeringForm(forms.ModelForm):
class Meta:
model = MountaineeringInfo

This'll generate a simple form for us, including dropdowns with options for every Landmark, Trip and Area we have defined. Let's look at the area foreign key first. Note how the area attribute of the Area model is nullable. Let's say we only wanted to be able to select Areas from our MountaineeringForm which had a valid area attribute set - put another way, we want to filter out those records which are null.

This is pretty straightforward, and is in fact covered in the docs. And there are, in fact, two ways to do it:

class MountaineeringForm(forms.ModelForm):
area = forms.ModelChoiceField(queryset=Area.objects.exclude(area=None))
class Meta:
model = MountaineeringInfo

This is probably the simplest way, and works well whenever the filtering you need to do does not depend on any request-specific or context-specific information.

The other option we have is to allow the ModelForm base class to do its usual thing, and then modify the fields that were generated directly.

class MountaineeringForm(forms.ModelForm):
class Meta:
model = MountaineeringInfo

def __init__(self, *args, **kwargs):
super(MountaineeringForm, self).__init__(self, *args, **kwargs)
self.fields['area'].queryset = Area.objects.exclude(area=None)

Now, this is slightly more verbose, and to my eye, not so clear as the first version. However, this approach of modifying the form after the fields have been constructed is a pattern we'll see in the coming examples.

Filtering a Django Admin Dropdown

Now, let's say that we want to edit MountaineeringInfo instances in the Django admin. At the moment, we just have this in our admin.py:

admin.site.register(MountaineeringInfo)

This generates a form much as we had previously with our simple ModelForm definition. However, we still want to filter out those Area instances which don't have an area set. We do this by providing a custom ModelAdmin sublcass, and overriding the formfield_for_foreignkey method:

class MountaineeringInfoAdmin(admin.ModelAdmin):
def formfield_for_foreignkey(self, db_field, request, **kwargs):
if db_field.name == 'area':
kwargs['queryset'] = Area.objects.exclude(area=None)
admin.site.register(MountaineeringInfo, MountaineeringInfoAdmin)

This is pretty straightforward, and is indeed documented in the Django docs. Note that the request is passed into this method, so it's easy to perform some filtering based on some aspect of the request - the currently logged-in user, for example.

Filtering an inline's dropdown based on the inline instance

This is slightly more complicated. Let's say we're editing a Trip, and we're editing MountaineeringInfo instances by way of an inline. In code terms, we've got something like this:

class MountaineeringInfoInline(admin.TabularInline):
model = MountaineeringInfo

class TripAdmin(admin.ModelAdmin)
inlines = [MountaineeringInfoInline]

Now, referring back to our models, let's say we want to filter the available Landmarks for an inline depending on what Area is selected. There are two cases we have to consider: what happens when the inline is displayed but blank (ie. it's not bound to a MountaineeringInfo instance); and then, once we've got a MountaineeringInfo instance to bind to.

This can be solved using custom inline formsets. Let's extend the MountaineeringInfoInline class:

class MountaineeringInfoInline(admin.TabularInline):
    model = MountaineeringInfo
    formset = MountaineeringInfoInlineFormset

Note we've defined an extra attribute, specifying a custom formset. Let's go ahead and define that formset:

class MountaineeringInfoInlineFormset(BaseInlineFormSet):
    def add_fields(self, form, index):
        super(MountaineeringInfoInlineFormset, self).add_fields(form, index)
        landmarks = Landmark.objects.none()
        if form.instance:
            try:        
                area = form.instance.area   
            except Area.DoesNotExist:
                pass  
            else: 
                landmarks = Landmark.objects.filter(point__within=area.area) form.fields['base_camp'].queryset = landmarks

Here, we override the inline formset's add_fields() method which - unsurprisingly - is called to generate all the fields that will appear in the inline formset. Note that the form is passed in as an argument. Since this is a ModelForm, the underlying instance (which will be a MountaineeringInfo instance, remember) is available using the instance attribute on the form. Now, if Django is generating a new, blank inline formset, then form.instance will be None. In this case, we don't want any landmarks to display - we want the user to have chosen an area first. Hence, we assign an empty QuerySet to the base camp field on the form.

On the other hand, if instance is set, then we have an existing MountaineeringInfo instance to work with. In this case, we get the area associated with it (note that area is nullable, so we have to wrap the access in a try/except to guard against the possibility that no area has been set) and create a QuerySet of all landmarks whose point lies within the area. So, when a user selects an area from the dropdown and presses Save, the landmarks contained in the base camp dropdown filter themselves to only those within the specified area.

Depending on your app, you might want the default value for landmarks to be Landmark.objects.all() rather than none() as per the example above - if so, remember that all() is the default, so you could eliminate some of that code.

The only thing to be aware of with the above code is if a user selects an area and a landmark, then changes the area so that the selected landmark is no longer valid for the area, the old landmark will remain set in the database. Of course, the base_camp dropdown would be blank, and would be reset to None when Save was pressed. If this were a problem, it would be possible to set the queryset to be Landmark.objects.filter(pk=form.instance.base_camp.pk).

Filtering an inline's dropdown based on the parent

OK, confession first - this feels like a hack. But I haven't found a cleaner way to do it - let me know if you know how to!

Note the Trip model has an Area foreign key as well. How might we go about filtering the 'area' dropdown in new MountaineeringInfo instances to only contain areas that are within the parent Trip's area?

Well, there are two parts to this:

  1. Figuring out what the area of the parent Trip is
  2. Filtering our own area dropdown depending on this values

We've actually done most of the work necessary to understand how to do the second part already. So let's do that part first. The key thing to know is that BaseInlineFormset-derived instances, like ModelAdmin instances, have a formfield_for_dbfield method. We can therefore override this to restrict the queryset used in fields contained within the inline. Let's extend our existing definition:

class MountaineeringInfoInline(admin.TabularInline):
    model = MountaineeringInfo
    formset = MountaineeringInfoInlineFormset
    
    def formfield_for_dbfield(self, field, **kwargs):
        if field.name == 'area':
            # Note - get_object hasn't been defined yet
            parent_trip = self.get_object(kwargs['request'], Trip)
            contained_areas = Area.objects.filter(area__contains=parent_trip.area.area)
            return forms.ModelChoiceField(queryset=contained_areas)
        return super(MountaineeringInfoInline, self).formfield_for_dbfield(field, **kwargs)

As you can see - very similar to what we've seen before. We use the get_object call to extract the Trip that this MountaineeringInfo instance is (or will be) related to, and find all Area instances which are contained by that parent Trip's area.

So, what does that get_object() method look like?

    def get_object(self, request, model):
        object_id = request.META['PATH_INFO'].strip('/').split('/')[-1]
        try:
            object_id = int(object_id)
        except ValueError:
            return None
        return model.objects.get(pk=object_id)

This clearly isn't ideal, as it depends on the URL structure used by the Django admin: it extracts the object ID by stripping off slashes, splitting on slashes, and taking the last element. It then looks up the appropriate object using the model class passed on.

So that class in full:
class MountaineeringInfoInline(admin.TabularInline):
    model = MountaineeringInfo
    formset = MountaineeringInfoInlineFormset

    def formfield_for_dbfield(self, field, **kwargs):
        if field.name == 'area':
            # Note - get_object hasn't been defined yet
            parent_trip = self.get_object(kwargs['request'], Trip)
            contained_areas = Area.objects.filter(area__contains=parent_trip.area.area)
            return forms.ModelChoiceField(queryset=contained_areas)
        return super(MountaineeringInfoInline, self).formfield_for_dbfield(field, **kwargs)

    def get_object(self, request, model):
        object_id = request.META['PATH_INFO'].strip('/').split('/')[-1]
        try:
            object_id = int(object_id)
        except ValueError:
            return None
        return model.objects.get(pk=object_id)

(In the real app code, that get_object() is in a base class, for easier reuse - hence the parameterisation of the model.)

Can you do better?

This kind of filtering is often required in non-trivial applications: be it filtering on security (which is relatively easy, as the request is usually present in most ModelAdmin APIs) or filtering on other data values - which seems a lot trickier than you might want. However, once you understand how the various ModelAdmin classes, inlines and formsets fit together, it's not too bad.

I'm keen to hear how you're tackling this in your Django apps - and whether this can be simplified!

Stereoplex is sponsored by Fez Consulting Ltd