Jul 30

Swoop Travel Live!

Foundry's first GeoDjango site, Swoop Travel, has gone live.

I'm pleased to announce that Foundry's first web site has gone live: Swoop Travel.

Swoop Travel is a GeoDjango site using PostgreSQL and PostGIS, although very little GIS functionality is in the public-facing site at the moment - it's mainly in the backend.

We built this from scratch in about a month (while juggling other projects too!) are are pretty pleased with how it's turned out. This is just the first iteration and we're looking forward to expanding the site.

GeoDjango is pretty awesome. As we work on the site and get more experience, I'll post some more about some of the innards: there's quite a lot of interesting stuff in the backend, particularly admin customisations like filtering dropdowns based on landmarks within geographic regions, and so on. I'll also try to talk about the production server configuration, as GeoDjango needs a slightly different WSGI configuration to the standard run-of-the-mill Django site.

Jan 23

Changing a user's home directory on Mac OS X Snow Leopard

After a botched PostgreSQL upgrade, I managed to leave my postgresql user with a non-existent home directory. Here's how to fix this on Snow Leopard.

I managed to mess up a PostgreSQL install on my Mac, which (somehow) left the postgres user account with an invalid home directory. Fixing this took quite a bit of digging.

First, launch the 'dscl' command as an administrator (probably with sudo). This drops you at a prompt, where familiar commands like cd, ls, cat and so on work. So, to change the user's home directory, I did:

 
bash $ sudo dscl
> cd /Local/Default/Users
/Local/Default/Users > ls
... user list ...
nobody
postgres
/Local/Default/Users > cat postgres
AppleMetaNodeLocation: /Local/Default
NFSHomeDirectory: /Users/dan/opt/pgsql
Password: *
PrimaryGroupID: 1
RealName: PostgreSQL
RecordName: postgres
RecordType: dsRecTypeStandard:Users
UniqueID: 504
UserShell: /bin/bash
/Local/Default/Users > change postgres NFSHomeDirectory /Users/dan/opt/pgsql /usr/local/pgsql
/Local/Default/Users > q
Goodbye
bash $

After that, the postgres user had a valid home directory, and all was well with the world.

Jan 18

OSError while installing Django with buildout and djangorecipe

A corrupted Django tarball can cause mysterious errors from djangorecipe.

Sometimes, I see the following error when trying to run a Django buildout:

File "/Users/dan/.eggs/djangorecipe-0.20-py2.6.egg/djangorecipe/recipe.py", line 271, in install_release
    os.listdir(extraction_dir)[0]
OSError: [Errno 2] No such file or directory: '/Users/dan/.downloads/django-archive'

After a bit of poking around, I found that this is to do with a corrupted Django tarball. In my case, this is usually because I've interrupted a download with Ctrl-C. Unfortunately it seems that the tarfile module in the Python standard library (at least as invoked by setuptools) treats broken a tar.gz files as an empty archive, without throwing an exception. Since there's no exception, djangorecipe assumes everything was uncompressed without problems, and is therefore rather surprised when the unpacked Django package isn't where it expected it to be.

The short term solution is to delete the bad Django archive from your download cache. This will likely be a 'downloads' directory in your buildout, or you may have a global one (as I do). When you next run buildout, the tarball will be freshly downloaded.

When I get a moment I'll see if I can modify djangorecipe to notice this condition and not proceed with the build.

Jan 14

Migrating to Django Mingus

I've migrated my blog from a creaking Plone 2.5 to a fork of Django Mingus. This is the process I went through.

I've been running a blog since 2007 (check the archives!). At the time, it made most sense for me to go for a Plone-based blog. Plone was what I was most familiar with, and there was a simple blog product out there that I could use called, sensibly enough, SimpleBlog. In fact, you can still grab it - it's where you'd expect on the Plone Products section. And as you can see, the release there was the one that I used at the time - SimpleBlog 2.0, for Plone 2.5

Fast-forward to the start of 2010, and things have moved on. Plone's moved on, for sure. Plone 4 is just around the corner, and there's some really, really cool stuff in there: Dexterity, a new content types framework, finally looks like it'll make content type creation as easy as it should be, and simple types and behaviours can be created through the web. Deco, slated (last time I heard) for Plone 5, is quite literally going to make publishers wet themselves. And Deliverance has got great potential in helping to unify the many disparate systems which makes the typical corporate user's daily life such a grind. Thing is, I'm not a big corporate user; a lot of what makes Plone great for that environment is overhead for me and my little blog.

The big change for me personally, though, is that more and more of my work is now Django. About two thirds of 2009 was Zope 2/Five, and a third Django; 2010 is looking like it'll be the other way around. So, having decided that my blog was looking a little tired, and seemed to be a bit of a spam magnet, I decided to bite the bullet and do a complete rebuild in Django.

I didn't want to write yet-another-blog from scratch, so I picked what seemed to have the most buzz around it at the time - Django Mingus - and started from there. And this is how I did it.

Oh, before I get stuck in, all the code for this site is open source. Fortunately, there's not much of it. There's no documentation as such (this is open source, after all) because the target audience is, well, me. I've stayed true to my Zope roots and used buildout; the buildout and Django project can be checked out from GitHub:

Stereoplex buildout and project

Onwards.

Planning

There's more to moving a blog than just installing your blog software and bashing out new articles. I had to consider a number of migration issues, specifically:

  • Obviously, I need to move all the content over from my old blog. That's not just articles: that's images, comments, the whole shebang.
  • I needed to either keep all the old article URLs from my previous blog working, or have them redirect to the new URLs.
  • Similarly, there are lots of RSS URLs out in the wild; I've handed a couple of them out to the Django community aggregator and the Planet Plone aggregator. There's also the main site feed.
  • I decided not to move user accounts over, as I'd already decided that I wasn't going to require a login to comment. I was going to go with a ReCaptcha integration.

So, the first step was obviously going to be to extract all the content from my old blog.

Getting the content out of the ZODB

Plone stores data in the ZODB. The ZODB is amazing. It was years ahead of its time, and provides a really natural way to store and interact with data in document-oriented systems. It's only really in the last year or so that we've seen the rise of broadly similar data stores, most of which talk HTTP and don't have the fine-grained transaction control that ZODB provides. The only thing to remember with the ZODB is that you can only access it directly with Python. That meant that I had to write a Python script to dump out all my content into a platform-neutral form. (I could have written a script which read the ZODB directly and created Django models, I guess. Using an XML intermediate format seemed easier though, and was probably quicker to work with - loading up the Plone 2.5 codebase is pretty slow.)

Most Plone applications, SimpleBlog included, stored its data using the Archetypes content type framework. Archetypes (often just AT) is a schema-driven approach to creating content. This is a pretty handy approach (even if AT's implementation was a bit awkward in places) as it's really simple to write code that can introspect that content. This was invaluable for writing an export script.

The result is on github: Simple-AT-XML-Dump. I should note that AT does have support for native XML dump and load. Thing is, I've got an ancient version of AT, and SimpleBlog used CMF (a layer underneath Plone) comments. I had no idea if XML marshalling would work properly, so I just did my own custom XML format. It's an instance script. It should work pretty well on any folderish Archetypes types. Invoke it like this:

bin/instance run Simple-AT-XML-Dump/run.py -p /zodb/path/to/plone/content -o out.xml

/zodb/path/to/plone/content is the physical path in the ZODB to dump. This needs to be the root folderish AT item. out.xml is the output file.

I won't go into the specifics of how the script works (if there's anything you're interested in especially, mail me or post a comment below). In a nutshell though, it'll dump out AT types by schema (including ImageFields with base64-encoded data) and any CMF discussions that are associated with them.

Starting with Django Mingus

Many years of pain have taught me to automate my build. The XML dumper instance script above is pretty much the smallest thing I'll do without a build (and even now, I feel a twinge of guilt at not having packaged it properly.) For me, therefore, the first thing to do was to get a build up and running.

The system du jour seems to be to use pip and virtualenv. Pip (a replacement for easy_install) lets you specify requirements files, which define which Python packages you application uses, what versions of them, and lets you install directly from the major source control systems. (This ability seems to have caused lots of requirements files with github links in them to spring up in 'released' packages, especially in the Django world. I regard this as a Bad Thing; but that's an opinion piece for another time). However, grizzled Zope veterans tend to reach for buildout in such a circumstance so, wanting to restrict New Fangled stuff to learning how Mingus was put together, I stuck with that. Buildout predates pip and virtualenv, and so does stuff that you'd normally just do with virtualenv, like creating an isolated environment. Buildout config files are also a touch more verbose than requirements files. That said, I'm still pretty sure that buildout is the more extensible system, with a vast array of custom recipes; and, in common with a lot of software from the Zope world, the narrative documentation is atrocious.

But I know buildout, and I'm not giving up these scars so easily, so that's what I went with, basing my buildout config on the requirements file that comes with Mingus. This process was actually pretty simple:

  1. Set up a standard Django buildout using djangorecipe
  2. Pop mr.developer in as an extension, add [sources] for all of the editable (-e) eggs in the pip requirements file, and add them all as auto-checkout
  3. Put all the non-editable eggs from the requirements file into the eggs section of the buildout config
  4. Use the versions supplied in the requirements file to create a [versions] section in the buildout config

This gives me a buildout.cfg file that mirrors the Mingus requirements file, but will also:

  • Create the django management script and django WSGI file automatically
  • Create a project and select an appropriate settings file for me

The end result of this is the GitHub project I linked to above, stereoplex-buildout.

Stereoplex

I created another egg, called stereoplex, to contain all my site-specific customisations and scripts. I used another package of mine, fez.djangoskel, to create the basic layout. Specifically, it has:

  • A Django management command to import the XML file created with Simple-AT-XML-Dump
  • A Django ModelAdmin subclass (actually a basic.blog.admin.PostAdmin subclass) to let me use TinyMCE as my editor
  • A ReCaptcha Django form field, widget, and custom comment form
  • An single extra view, which returns all items posted on the blog
  • A URLConf which brought together Mingus' URLs plus those for the extra view, and for TinyMCE
  • And of course, all the template overrides and CSS, JavaScript and images required for the new Stereoplex look and feel

Importing the Data

The next step was to write a data importer. This had to do a number of things (data migrations are never simple!):

  • Import all the images in the XML data file, creating basic.media.models.Photo instances for each of them
  • Rewrite all image links in the body text of posts to contain <inline> elements used by Mingus
  • Create basic.blog.models.Post instances for each blog post in the file
  • Create django.contrib.comments.models.Comment instances for every comment, and associate them with the appropriate post
  • Create django.contrib.redirect.models.Redirect objects for each imported post, to allow existing inbound links to be redirected to the new location.

Automated content import is one of those things that you tell clients is usually impossible. And when they're migrating from a legacy CMS platform (or indeed, hand-maintained HTML), then that's usually right. The inevitable gigabytes of hand-rolled HTML are at best poorly formed, and at worse represent content which needs throwing away anyway.

I was more fortunate. I didn't have that much content to migrate - 60-odd posts, and a few images - and Plone 2.5's default editor Kupu is actually pretty good at producing good HTML. I was able to directly use the existing HTML, and only needed to replace the <img> tags with the appropriate <inline> expected by Mingus.

Changes to Mingus packages

I did make some modifications to Mingus' packages. These were as follows:

django-mingus

django-basic-apps

django-sugar

All really very minor.

Server Configuration

Apache

The Plone instance used a standard small setup: a single ZEO client talking to a ZEO server, fronted by Apache with the magic RewriteRule. The Zope configuration had been tweaked slightly to be usable on a small (by Plone's standard!) 512MB RAM host, but I hadn't had time to do any other optimisations (for example, serving static files from Apache) that I would normally do.

Django forces you to do at least some of these. Static files are always served by an external web server (unless you are really determined to sail through all the warnings in the documentation). I also finally got around to turning on gzip compression.

One remaining task that I planned to do in the Apache configuration was to provide redirects for my RSS feeds. This wasn't as easy as I might have liked, since the old feed URLs had query string elements; and indeed, it was some of those values that I needed to formulate a correct redirect. I ended up with the following:

RewriteEngine On<br />RewriteCond %{query_string} ^(.*)?EntryCategory=(.*)?&amp;(.*)<br />RewriteRule ^/search_rss /feeds/categories/%2/ [R=permanent,L,NC]<br />

This simply declares three match groups in the RewriteCond regex, the second of which (hence %2) is the category slug. The RewriteRule then issues a permanent redirect to the new URL. We have to use RewriteCond, because RewriteRule regexes won't match a query string, only the URL path.

Next up is the mod_wsgi configuration. I use a fairly standard configuration, which is generally as follows:

WSGIScriptAlias / /var/websites/www.stereoplex.com/bin/django.wsgi<br />WSGIDaemonProcess stereoplex user=stereoplex group=stereoplex processes=3 threads=25 maximum-requests=1000 stack-size=524288<br />WSGIProcessGroup stereoplex<br />

This runs the Stereoplex web application in its own process group, and with its own user and group membership. This is a safety net: if the site is compromised or (to be honest, more likely) I screw up and the app tries to write to the filesystem, its rights are limited by the host's file access control. The WSGI script file is generated by buildout. Probably the most interesting parameter here is the stack size. This is set lower than Linux's default value; for this sort of app, it doesn't need to be as large as the default. Setting this value to lower than the default led to a massive memory saving (remember, this is only a 512MB host; and this site is one of around half a dozen running on the same machine).

Memcached

Mingus makes extensive use of Django's caching support. Other Django sites on the same host use a Memcached instance, so I just pointed Stereoplex at that one. Memcached is essentially in a default configuration, except only bound to the localhost IP address, rather than the public IP address. It's configured to use a maximum of 64MB of RAM. This doesn't sound a great deal, but even with two or three websites using it, I haven't seen it go over 40MB.

So... All Done?

Nearly.

If I'm honest, the content editing experience isn't as nice as Django as it was in Plone. This is expected. Plone is a CMS, and has an administrative interface that's focussed on the business of managing content. Django isn't a CMS, it's a more general web framework. The experience is more like editing content in the ZMI.

That said, there are advantages. I've found third-party Django software much easier to integrate and customise than third-party Plone products ever were. I'm not fighting reams of configuration all the time. There's a strong mindset of developing apps to be reusable in the Django world; Mingus itself is little more than some UI glue (as, really, is Stereoplex). If I were tasked with delivering a large CMS with flexible authentication and authorisation, workflow, and so forth: I'd go for Plone in an instant. But for this job, many of Plone's strengths simply don't apply, and somthing more simple and lightweight was more appropriate.

Anyway - I'm quite pleased with the results. There are still a few kinks to be worked out (pygementize only seems to be being applied on the home page, not individual post pages, for example) but I'll get there over the next couple of weeks.

If there's anything you'd like to have more information on, then leave a comment (click on the article heading - yes, a proper link to comments is on the list!) or of course, just go and grab the code at Github.

Enjoy!

Jan 14

/etc/cron.daily scripts not running on Ubuntu

My Subversion backup cron job in /etc/cron.daily wasn't running. Finally I figured out why.

For a while, I've had a problem where a script that I'd dropped into my /etc/cron.daily directory on my Ubuntu Linux box stubbonly refused to run. It was a Python script, called backup-svn.py (guess what it does), and ran absolutely fine when called from the commandline.

I was scratching my head about this again last night, staring at the list of jobs in that directory. Why were all running but that one? There was a #! line at the top of the script, it had executable permissions. However, it was the only script in the directory which had a dot (period) in its filename.

mv backup-svn.py backup-svn

... and then the job starts running.

If anyone knows what the rationale behind not allowing job names, I'd love to know - leave a comment below.

For those interested, here's the script. Feel free to use and customise it. It looks for every repo under /var/repos, backs it up, and packs the lot into a tar.bz2 file to be transferred offsite.

 

#!/usr/bin/python

import os
import subprocess
import tarfile
import bz2
import shutil

REPO_PARENT_DIR = '/var/repos'
BACKUP_LOCATION = '/var/backups'

def backup():
    repos = os.listdir(REPO_PARENT_DIR)
    for repo in repos:
        repo_path = os.path.join(REPO_PARENT_DIR, repo)
        backup_path = BACKUP_LOCATION
        if not os.path.exists(backup_path):
            os.mkdir(backup_path)
        ret = subprocess.call(['svn-hot-backup', repo_path, backup_path])
        if ret:
            raise ValueError, 'Error executing backup, exit code was ' + str(ret)
    repo_backups = [ dirname for dirname in os.listdir(BACKUP_LOCATION)
                     if dirname[:dirname.find('-')] in repos ]
    tarname = os.path.join(BACKUP_LOCATION, 'svn-backup.tar.bz2')
    tar = tarfile.open(tarname, 'w:bz2')
    for repo in repo_backups:
        tar.add(os.path.join(BACKUP_LOCATION, repo), arcname=repo)
    tar.close()
    for repo in repo_backups:
        shutil.rmtree(os.path.join(BACKUP_LOCATION, repo))

if __name__ == '__main__':
    backup()
Nov 16

Running a test mail server

A quick tip to ease testing emails.

Just a quick one for today.

Often I find myself needing to test email sending functionality. I often don't have access to a full mail server, so it's handy to be able to run something up locally to test with. If you've got Python installed, then you can do this really easily. Assuming Python 2.6 on Linux:

python /usr/lib/python2.6/smtpd.py -n -c DebuggingServer localhost:8025

That will start a debugging mail server on port 8025 on your local machine. Simply configure your web app to use an SMTP server running on your dev machine on port 8025, and when the app sends an email, you should see it scroll by on your terminal.


Nov 08

Python, Unicode and UnicodeDecodeError

In the years I've been developing in Python, Unicode seems to be the topic which causes the greatest amount of confusion amongst developers. Hopefully much of this confusion should go away in Python 3, for reasons I'll come to at the end; but until then, the UnicodeDecodeError is the bane of many developers' lives.

Unicode and Encodings

OK, let's take a step away from text for a moment. I want you to think of a number between one and ten. Got one? Great - now, grab a pen and paper, and write it down.

What number did you think of? Well, I thought of the number six. And when I wrote it down, it looks like this:

digit

digit


Of course, if I were an ancient Roman (or possibly a clockmaker), I could have written this:
   

bars

bars



   
They all mean the same thing - the number six. But we've written them in different ways. In other words, we've 'encoded' our idea of the number six in our head in three different ways - three different encodings.

The separation of the idea of 'the number six' from its actual representation is basically all Unicode is. The Unicode Character set (UCS) defines a set of things (loosely, a set of letters) that we can represent. How we represent each of those letters is called an encoding. There's only one Unicode, but there are many encodings. In Unicode parlance, each of those 'things' (letters) are known as 'code points'. Unicode separates the characters' meaning from their representation.

For historical reasons, the most common encoding (in Western Europe and the US, anyway) is ASCII. This is also Python's default encoding.

Let's think about ASCII for a moment. It's an encoding that uses 7 bits, which limits it to 128 possible values. That's enough to represent all the characters that Western Europe and the US use (letters in both cases, the numbers, punctuation, a few characters with diacritics). Therefore, Unicode strings that only include code points that are in these 128 ASCII characters can be encoded as ASCII. Conversely, any ASCII encoded string can be decoded to Unicode.

It's worth reiterating that terminology, as you come across it a lot: the transformation from Unicode to an encoding like ASCII is called 'encoding'. The transformation from ASCII back to Unicode is called 'decoding'.

    Unicode  ---- encode ----> ASCII
    ASCII    ---- decode ----> Unicode

Non-ASCII encodings

Most people don't live in the US or Western Europe, and therefore have a requirement to store more characters than can be represented with ASCII. What those folk need to represent *is* part of the Unicode set (Unicode is massive!) - so a different encoding is required. Common encodings have familiar names: UTF-8 and UTF-16. UTF-8, for example, uses a single byte for encoding all the ASCII values, then variable numbers of bytes to encode further characters. (The ins and outs of these encodings are beyond the scope of this article - check out their respective Wikipedia entries for the gory details.)

The fact that the first byte of UTF-8 isthe same as ASCII is important, since it means that the encoding is backwards-compatible with ASCII. However, it can mask problems in software. We'll come to this shortly.

Some terminology

Unicode-related terminology can get confusing. Here's a quick glossary:

  • To encode
  • Encoding (the verb) means to take a a Unicode string and produce a byte string
  • To decode
    • Decoding (the verb) means to take a byte string and produce a Unicode string
  • An encoding
    • An encoding (the noun) is a mapping that describes how to represent a Unicode character as a byte or series of bytes. Encodings are named (like 'ascii', or 'utf-8') and are used both when encoding (verb!) Unicode strings and decoding byte strings.


    In other words, when you encode or decode, you need to specify the encoding that you're using. This will become clearer shortly.

    Python, bytes and strings

    You've probably noticed that there seems to be a couple of ways of writing down strings in Python. One looks like this:

      'this is a string'

    Another looks like this:

      u'this is a string'

    There's a good chance that you also know that the second one of those is a Unicode string. But what's the first one? And what does it actually mean to 'be a Unicode string'?

    The first one is simply a sequence of bytes. This byte sequence is, by convention, an ASCII representation (ie. encoding) of a string. The whole Python standard library, and most third-party modules, happily deal with strings natively in this encoding. As long as you live in US or Western Europe, then that's probably fine for you.

    The second one is a representation of a Unicode string. This can therefore contain any of the Unicode code points. It's possible that whatever you're using to edit the Python code (or just view it) might not be able to display the entire Unicode character set - for instance, a terminal usually has an encoding that it assumes data it's trying to display is in. There's a special notation, therefore, for representing arbitrary Unicode code points within a Python Unicode string: the \u and \U escapes. These will be followed by four or eight hex digits; there's some subtlety here (see the Python string reference for further information) but you can simply think of the number after the \u (or \U) representing the Unicode code point of the character. So, for example, the following Python string:

      u'\u0062'

    represents LATIN SMALL LETTER B, or more simply:

      u'b'

    To summarise then: the Unicode character set encompasses all characters that we may wish to represent. Individual encodings (ASCII, UTF-8, UTF-16, etc.) are representations of all or some of that full Unicode character set.

    Encoding and Decoding

    Byte strings and Unicode strings provide methods to perform the encoding and decoding for you. Remembering that you *encode* from Unicode to an encoding, you might try the following:

    >>> u'\u0064'.encode('ascii')
    'd'

    As you'd expect, the Unicode string has an 'encode' method. You tell Python which encoding you want ('ascii' in this case, there are lots more supported by Python - check the docs) using the first parameter to the encode() call.

    Conversely, byte strings have a decode() method:

    >>> 'b'.decode('ascii')
    u'b'


    Here, we're telling Python to take the byte string 'b', decode it based on the ASCII decoder and return a Unicode string.

    Note that in both these previous cases, we didn't really need to specify 'ascii' manually, since Python uses that as a default.

    UnicodeEncodeError

    So, we've established that there are encodings which can represent Unicode, or more usually, a certain subset of the Unicode character set. We've already talked about how ASCII can only represent 128 characters. So, what happens if you have a Unicode string that contains code points that are outside that 128 characters? Let's try something all too familiar to UK users: the £ sign. The Unicode code point for this character is 0x00A3:

    >>> u'\u00A3'.encode('ascii')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3'
    in position 0: ordinal not in range(128)

    Boom. This is Python telling you that it encountered a character in the Unicode string which it can't represent in the requested encoding. There's a fair amount of information in the error: it's giving you the character that it's having problems with, what position it was at in the string, and (in the case of ASCII) it's telling you that the number it was expecting was in the range 0 - 127.

    How do you fix a UnicodeEncodeError? Well, you've got a couple of options:

    • Pick an encoding that does have a representation for the problematic character
    • Use one of the error handling arguments to encode()

    The first option is obviously ideal, although its practicality depends on what you're doing with the encoded data. If you're passing it to another system that (for example) requires its text files in ASCII format, you're stuck. In that case, you're left with one of the other two options. You can pass 'ignore', 'replace', 'xmlcharrefreplace' or 'backslashreplace' to the encode call:

    >>> u'\u0083'.encode('ascii', 'ignore')
    ''
    >>> u'\u0083'.encode('ascii', 'replace')
    '?'
    >>> u'\u0083'.encode('ascii','xmlcharrefreplace')
    '&#131;'
    >>> u'\u0083'.encode('ascii','backslashreplace')
    '\\x83'


    If you choose one of those options, you'll have to let the eventual consumer of your encoded text know how to handle these.

    UnicodeDecodeError

    This one is probably more familiar to most developers. A UnicodeDecodeError occurs when you ask Python to decode a byte string using a specified encoding, but Python encounters a byte sequence in that string that isn't in the encoding that you specified (phew!). This one probably benefits from an example.

    Consider once more the ASCII encoding. Being a 7-bit representation, ASCII only has 127 characters, represented by the numbers 0 - 127. So let's imagine the ASCII-encoded string below:

    'Hi!'


    In terms of ASCII numbers, that is:

      72 105 33

    Or in actual Python:

    >>> s = chr(72) + chr(105) + chr(33)
    >>> s
    'Hi!'
    >>> s.decode('ascii')
    u'Hi!'

    That's all great. But what happens if we add a byte that's not in the ASCII range?

    >>> s = s + chr(128)
    >>> s.decode('ascii')
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x80
    in position 3: ordinal not in range(128)

    Boom. Python is saying that it encountered a character 0x80 (which is 128 in hex, the one we added) which was at position 3 (counting from zero) in the source byte string which was not in the range 0 - 127.

    This is normally caused by using the incorrect encoding to try to decode a byte string to Unicode. So, for example, if you were given a UTF-8 byte string, and tried to decode it as ASCII, then you might well see a UnicodeDecodeError.

    But why only might?

    Well, remember what I mentioned before - UTF-8 shares the first 127 characters with ASCII. That means that you can take a UTF-8 byte sequence, and decode it with the ASCII decoder, and *as long as there are no characters outside the ASCII range* it will work. *Only* when that byte string starts featuring characters which don't exist within the ASCII encoding do errors start being thrown.

    ASCII - the default codec

    Lots of Python programmers (well, US and Western European ones) can get quite a way into their Python careers converting byte strings to unicode like this:

    >>> print unicode('hi!')
    u'hi!'

    What's going on here? Well, Python uses the ascii codec by default. So, the above is equivalent to:

    >>> 'hi!'.decode('ascii')
    u'hi!'

    And, because most US/European test data is composed of this byte string:

      'test'

    ... nobody notices the problem until the Japanese office complains the intranet is broken.

    Unicode Coercion

    If you try to interpolate a byte string with a Unicode string, or vice-versa, Python will try and convert the byte string to Unicode using the default (ie. ascii) codec. So:

    >>> u'Hi' + ' there'
    u'Hi there'
    >>> u'Hi %s' % 'there'
    u'Hi there'
    >>> 'Hi %s' % u'there'
    u'Hi there'

    These all work fine, because all the strings that we're working with can be represented with ASCII. Look what happens when we try a character which can't be represented with ASCII though:

    >>> u'Hi ' + chr(128)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x80
    in position 0: ordinal not in range(128)


    Python sees we're trying to combine a Unicode string with a byte string, so tries to decode the byte string to Unicode using the ASCII codec. Since character 128 (the Euro symbol, as it happens) can't be represented in ASCII, Python throws a UnicodeDecodeError.

    In my experience, Unicode coercion is often where UnicodeDecodeErrors manifest themselves. The programmer has a Unicode string (probably a template) into which they're trying to put some data from a database. Relational databases tend to supply byte strings. Usually the encoding is a property on the database connection. Often, however, developers simply assume it's ASCII (or don't do anything special at all, which in Python amounts to the same thing). They try to stick the data from the database (perhaps in UTF-8 or ISO-8859-1) into a Unicode string using the %s format specifier, Python tries to decode the byte string using the ascii codec, and the whole thing falls flat on its face.

    Why do Python byte strings have an encode() method?

    The sharp-eyed amongst you will have noticed that byte strings have an encode() method as well as a decode() method. What does this do? Quite simply, it does a decode-then-encode. The byte string is decoded to Unicode using the default (ascii) encoding, and is then encoded to the target encoding specified in the call to encode() using the appropriate encoding. As you'd expect, fun and games ensue if the original byte string isn't actually encoded in ASCII at all.

    Avoiding Unicode Errors

    So - this is really what you care about, right? How do you avoid these Unicode problems? Well, there are three simple rules:

    • Within your application, always use Unicode
    • When you're reading text in to your application, decode it as soon as possible with the correct encoding
    • When you're outputting text from your application, encode at that point and do it explicitly

    What does this mean in practice? Well, it means:

    • Whenever you're writing string literals in code, always use u''.
    • Whenever you read any text in, call .decode('encoding') on the byte string to obtain Unicode
    • Whenever you're writing text out, pick an appropriate encoding to handle whatever Unicode you're outputting - remember that ASCII can only represent a very limited subset


    There are more places than you probably realise that text can get into your application. Here's some:

    • An incoming request from a web browser
    • Some text read in from a data file on disk
    • A template file read in from disk
    • Some user's input from a form
    • Some data from a database
    • Data returned from a web services call


    Frameworks help a lot here. Many frameworks handle the common encoding and decoding cases (usually the template encoding, and data encoding from a database) for you, and just pass you back Unicode strings. Watch out for web request variables - many of those may be plain byte strings. Also watch out for web service responses; you might need to inspect the response headers to find out the encoding. And even then be careful; I've come across situations with in-house apps where declared encoding were simply wrong, leading to unexpected UnicodeDecodeErrors.

    Figuring out which encoding to use

    When you're faced with a byte string, how do you know which decoding to use? The answer is, unfortunately, simple: you don't. Some environments (such as the Web) may help you - HTTP requests and responses contain headers which specify the encoding used within them. You can inspect those, and if they're wrong - well, at least you've got someone else to blame.

    If you're lucky, you know the byte string is encoding some XML. XML is gets a lot of flack, but one of the things it does right is to specify explicitly a default encoding that's actually useful (UTF-8) and provide a mechanism to declare a different encoding. So with XML, you can scan the first few bytes of the file, decode using UTF-8, and look for the magic encoding declaration. If there isn't one, then you can safely decode the rest of the file using UTF-8. If there is one, then switch encoding. Of course, your XML library of choice will do all this for you, and should give you Unicode text back once you've read your XML in.

    If you're unlucky, then you've got two more options. First off, you can talk to the people who run your source (or destination) system - find out what encodings they're using, or accept, and use those.

    The final, last resort option is to simply have a range of common encodings to try. A list I often use is ASCII, ISO-8859-1, UTF-8, UTF-16. Keep trying to decode with each of those in turn until one works. Which encodings you pick of course depends on what kind of files you're expecting to see. You may also run into problems of course if you have a byte string in encoding X which also happens to be valid when decoded using encoding Y - in this case, you'll just get garbage data. This is the cause of many of the 'funny character' bugs you see in web applications: byte strings being decoded using an encoding which happened to work, but was in fact not the original encoding used to create the byte string.

    Python 3

    I'm not going to talk too much about Python 3, since I haven't actually used it yet.

    But - you rarely hear .NET or Java programmers complaining about Unicode errors. This is simply because both .NET and Java define a string to *be* Unicode in the first place. Anything involving the String class (in either runtime) is Unicode anyway; the developer sees encoding problems much less frequently as it's much less common for unexpected byte data to creep into applications. This doesn't mean the problems don't exist, of course: at the end of the day, text is still being encoded to and from byte strings; it's just done explicitly. (The fact that the default encoding on MS Windows, the OS on which many of these systems run, is UTF-16 helps here too - many more characters can be encoded in UTF-16 than ASCII).

    My understanding is that Python 3 takes this general approach. Python 2's 'str' type is gone. In its place is the 'unicode' type (equivalent to Java and .NET's String class), and the 'bytes' type. String operations are done on 'unicode' instances.

    Coding in a Unicode world

    Unicode is here to stay. The days of writing software that would only need to work in American universities, where the only language and script used was US English in Latin text are long gone. There's no magic to Unicode and the various encodings, and once you understand what's going on, there's no reason to have that sick feeling in the pit of your stomach the next time you see a UnicodeDecodeErrror. Just remember these rules:

    • Decode on the way in
    • Unicode everywhere in your application
    • Encode on the way out

    Sep 23

    Firefox won't start after upgrading to Snow Leopard

    My upgrade from Leopard to Snow Leopard went pretty smoothly - except for Firefox, which steadfastly refused to start. No error messages, nothing. Just a single dock bounce, and goodnight.

    I bought a new MacBook Pro recently, with Leopard installed. I went ahead and bought it before the Snow Leopard release, since Apple have a programme called 'Up-to-date' where you can upgrade to the new OS for a reduced price, £7.50 in my case. I figured I'd pay the £7.50 for six weeks' extra usage out of the laptop.

    The laptop duly arrived, I migrated my account from my old MacBook over using Migration Assistant, and all was (mostly, but that's a topic for another post) well. Then the Snow Leopard DVD arrived. I upgraded, and all seemed good. The only problem I found was that Firefox wouldn't start. No errors, nothing. Just a couple of bounces in the Dock and then it disappeared. Starting up Firefox from another account worked fine, so I resigned myself to rebuilding my user account at some point.

    Then, I tried Spotify. Spotify also refused to start, with a helpful 'Internal Error 1' message. Googling indicated that that this could be caused by Spotify being unable to write to its cache directory. This seemed strange, since I'd just installed it in a local ~/Applications directory - why would it be trying to write to a system location that it wouldn't have permission to touch?

    However, the reference to caches provided a clue. A quick ls -l of ~/Library revealed that (goodness knows how) the Caches directory within Library had somehow obtained root ownership, so my own user account didn't have permission to write to it. This was solved with the following command, run as an administrative user:

    chown -R dan:staff /Users/dan/Library/Caches

    I have no idea how the Caches directory came to have that ownership. After correcting the above, Spotify runs, Firefox starts, and the whole OS experience seems quicker. Not hugely surprising, now disk caches can actually be written!

    Apr 17

    Installing GeoDjango with PostgreSQL and zc.buildout

    The installation of the PostgreSQL requirements is somewhat daunting. I've spent a bit of time putting together a buildout.cfg to try to make this easier.

    I've been wanting to play with GeoDjango for a while, since my database of choice (PostgreSQL) has excellent spatial support. However, getting all the dependencies up and running is pretty complicated.

    I've been working on a buildout to get at least most of the steps done for you. There are a couple of manual steps at the end, which I hope to automate when I next have time to work on this.

    The buildout installs the following items:

    • PostgreSQL
    • PostGIS
    • GDAL
    • Proj
    • GEOS
    • psycopg2
    • Django

    It should also perform initial setup of the PostGIS database template, loading some sample SQL files, and sets up some convenience symlinks for the PostgreSQL command-line programs.

    It's not finished - in particular, it just assumes that the user running the buildout is to be used as the database owner and such like. Anyway, here it is:

     

    [buildout]
    parts =
    postgresql
    postgis
    gdal
    init-pgsql
    pgsql-symlinks
    django

    eggs =
    psycopg2

    [postgresql]
    recipe = zc.recipe.cmmi
    url = http://wwwmaster.postgresql.org/redir/198/h/source/v8.3.7/postgresql-8.3.7.tar.gz
    extra_options =
    --with-readline
    --enable-thread-safety

    [postgis]
    recipe = hexagonit.recipe.cmmi
    url = http://postgis.refractions.net/download/postgis-1.3.5.tar.gz
    configure-options =
    --with-pgsql=${postgresql:location}/bin/pg_config
    --with-geos=${geos:location}/bin/geos-config
    --with-proj=${proj:location}

    [proj]
    recipe = zc.recipe.cmmi
    url = http://download.osgeo.org/proj/proj-4.6.1.tar.gz

    [geos]
    recipe = zc.recipe.cmmi
    url = http://download.osgeo.org/geos/geos-3.0.3.tar.bz2

    [gdal]
    recipe = zc.recipe.cmmi
    url = http://download.osgeo.org/gdal/gdal-1.6.0.tar.gz
    extra_options =
    --with-python
    --with-geos=${geos:location}/bin/geos-config

    [init-pgsql]
    recipe = iw.recipe.cmd
    on_install = true
    on_update = false
    cmds =
    ${postgresql:location}/bin/initdb -D ${postgresql:location}/var/data -E UNICODE
    ${postgresql:location}/bin/pg_ctl -D ${postgresql:location}/var/data start
    sleep 30
    ${postgresql:location}/bin/createdb -E UTF8 template_postgis
    ${postgresql:location}/bin/createlang -d template_postgis plpgsql
    ${postgresql:location}/bin/psql -d template_postgis -f ${postgis:location}/share/lwpostgis.sql
    ${postgresql:location}/bin/psql -d template_postgis -f ${postgis:location}/share/spatial_ref_sys.sql
    ${postgresql:location}/bin/psql -d template_postgis -c "GRANT ALL ON geometry_columns TO PUBLIC;"
    ${postgresql:location}/bin/psql -d template_postgis -c "GRANT ALL ON spatial_ref_sys TO PUBLIC;"
    ${postgresql:location}/bin/pg_ctl -D ${postgresql:location}/var/data stop

    [pgsql-symlinks]
    recipe = cns.recipe.symlink
    symlink_target = ${buildout:directory}/bin
    symlink_base = ${postgresql:location}/bin
    symlink =
    clusterdb
    createdb
    createlang
    createuser
    dropdb
    droplang
    dropuser
    ecpg
    initdb
    ipcclean
    pg_config
    pg_controldata
    pg_ctl
    pg_dump
    pg_dumpall
    pg_resetxlog
    pg_restore
    postgres
    postmaster
    psql
    reindexdb
    vacuumdb

    [django]
    recipe = djangorecipe
    version = 1.0.2
    project = project
    eggs =
    ${buildout:eggs}

    Note that running this will actually attempt to start up and shut down the database server, as it needs to be running in order for some of the initialisation scripts to run. That 'sleep 30' in the middle is to allow the database server to start, and (if you're on OS X and running it) to give you a change to enter your username and password for the firewall!

    There are still some manual steps to be taken (which I'd like to automate in due course). These are the fairly standard things that you do when starting any Django project, plus an extra step for bootstrapping PostGIS.

    Create your database

    From the command line, you'll need to create the database for you application. You need to specify the PostGIS template, so use something like:

    $ bin/createdb -T template_postgis <db name>

    Change the settings for your application

    Edit the settings.py for your application, and make sure that you're using 'postgresql_psycopg2' as the database engine. Set the database name as appropriate for your application. You should also add 'django.contrib.gis' to your INSTALLED_APPS setting, and you'll also need to add the following two lines to your settings.py:

    GDAL_LIBRARY_PATH = '/path/to/buildout/parts/gdal/lib/libgdal.dylib'
    GEOS_LIBRARY_PATH = '/path/to/buildout/parts/geos/lib/libgeos_c.dylib'

    Add Google projection

    I'll confess: I'm only doing this because the GeoDjango docs say you should! I don't know enough about GeoDjango yet to understand why. But you should do the following:

    $ bin/django shell
    >>> from django.contrib.gis.utils import add_postgis_srs
    >>> add_postgis_srs(900913)
    >>> ^D
    $

    If you get an error when importing add_postgis_srs, then double check you got the GDAL_LIBRARY_PATH and GEOS_LIBRARY_PATH correct, and that the files specified were built. (I'm on Mac OS X - I suspect the exact file name may change depending on platform.)

    Done!

    Once all that's done, you should hopefully be able to bin/django syncdb, start a new app (using fez.djangoskel, of course!) and start using GeoDjango.

    I shall refine the above process over time (in particular, there are some modifications I'd like to make to djangorecipe to remove the manual steps at the end), and I'll post extra parts when I've done that.

     

    Apr 15

    VMWare Fusion guests with a static IP

    The article that I followed to get a static IP for VMWare fusion guests seems to have been removed, so in the name of preserving this knowledge I'm reproducing the salient parts here.

    It's not straightforward to assign static IP addresses to guests in VMWare Fusion, and the article from which I took my instructions has been removed, and lives on only in Google caches and the like. Since I don't want to lose this information, I'm reposting the technical content here. Thanks to the original author, Gary Day, for his research.

    Gary, if you're reading this and want me to link to your canonical original version under a new URL, please contact me and I'll sort it out.

    Gary's original content appears below. Apologies for the slight formatting issues, it's a kupu-n-paste job.


    Guest Configuration Information

    Open a Finder window and navigate to your Virtual Machines folder, probably /username/Documents/Virtual Machines. Locate the VM package for the guest you want to use for this procedure (Virtual Machines are represented by a single file which is a package containing multiple disk and configuration files).
    CTRL-CLICK the Virtual Machine and select “Show Package Contents”, this displays the components of your Virtual Machine. Find Guest.vmx (in my case “UbuntuGnome.vmx”), CTRL-CLICK again and open with your text editor (in my case TextMate, can’t live without it). This action will show the default configuration of “UbuntuGnome”. Search this text file for “ethernet0.generatedAddress” and you fill find the following (similar) information:

    ethernet0.generatedAddress = "00:0c:29:8b:a4:4f"

    This is your Virtual Machine’s “MAC” or Ethernet Hardware Address.
    Copy this information to a text file because I doubt most people can remember such things for more than a second or two.

    Accessing VMWare Fusion’s DHCP Settings

    VMware Fusion’s DHCP configuration file is located in “Application Support”.
    Open a terminal and set a command to open this config file in your text editor of choice.

    ~ user$ mate "/Library/Application Support/VMware Fusion/vmnet8/dhcpd.conf"

    This is what you should see:


    # Configuration file for ISC 2.0b6pl1 vmnet-dhcpd operating on vmnet8.
    #
    # This file was automatically generated by the VMware configuration program.
    # If you modify it, it will be backed up the next time you run the
    # configuration program.
    #
    # We set domain-name-servers to make some DHCP clients happy
    # (dhclient as configued in SuSE, TurboLinux, etc.).
    # We also supply a domain name to make pump (Red Hat 6.x) happy.
    #
    allow unknown-clients;
    default-lease-time 1800; # 30 minutes
    max-lease-time 7200; # 2 hours
    subnet 172.16.27.0 netmask 255.255.255.0 {
    range 172.16.27.128 172.16.27.254;
    option broadcast-address 172.16.27.255;
    option domain-name-servers 172.16.27.2;
    option netbios-name-servers 172.16.27.2;
    option domain-name “localdomain”;
    option routers 172.16.27.2;
    }

    Note the subnet range, we need to set a fixed address for our Virtual Machine outside of this range.
    We do this like so:
    Append the open file (dhcpd.conf) with the following, obviously using your own settings including the Ethernet Hardware Address you previously copied to a text file, the name of Guest.vmx and the IP address you wish to assign to this Virtual Machine.

    host UbuntuGnome {
    hardware ethernet 00:0c:29:8b:a4:4f;
    fixed-address 172.16.27.20;
    }

    Save this file, you will prompted to enter your administrator password as we have opened dhcpd.conf as a read only file in TextMate.

    We now need to restart networking for VMWare Fusion:

    sudo "/Library/Application Support/VMware Fusion/boot.sh" --restart

    Configuring Hosts In Linux Guest

    That’s a confusing title I must admit.
    Fire up your Virtual Machine..
    NOTE:
    I am using this Ubuntu Guest as it was a machine already on my system, the following configuration information will differ slightly between distros and interfaces. For my “Micro-Network” experiments I will be using minimal, command line installations of CentOS 5.02 and I will cover this in later posts.

    If you are using the Gnome Desktop navigate to:

    System> Administration> Network

    Use your superuser password to unlock the network applet and select “Wired Connection (Properties)”.
    Disable roaming mode (it’s set this way by default on a new Ubuntu installation) and enter the settings for Configuration (Static IP Address).

    IP ADDRESS: (The Address You Set In The VMWare DHCP Settings)
    SUBNET: (Usually 255.255.255.0)
    GATEWAY: (The Address Of The VMware Fusion Server)*

    * “option routers XXX.XX.XX.X” in dhcp.conf

    Save these settings and restart networking (or your Virtual Machine).

    This procedure can repeated for each Virtual Machine you want to add to your “Virtual Network” by adding a host entry for each guest machine in “/Library/Application Support/VMware Fusion/vmnet8/dhcpd.conf”.

    Apr 15

    BBC iPlayer "...temporarily unavailable. Please try again later"

    I got the dreaded "(programme) is temporarily unavailable. Please try again later" while using the BBC iPlayer desktop AIR app. Fixing it was simple... once I found out how.

    The iPlayer is great. Like the PVR, it changes the way you watch television. And so I was excited when the desktop version of the software became available. I was somewhat disappointed to discover it was an AIR app (I've been burned by bad Adobe installers and problems with Flash on non-Windows platforms in the past) but, for the functionality it offered, I was prepared to overlook these problems.

    Unfortunately, after a while, it broke. Clicking on a downloaded title gave the unhelpful message: "(programme name) is temporarily unavailable. Please try again later."

    OK, so I know the drill about not scaring end users with incomprehensible messages, but please, give us something to go on.

    The solution, as it often is, was a simple delete and reinstall. What I didn't realise, however, was that merely deleting the application itself and the downloaded movies folder wasn't sufficient. There are some other directories you have to remove too, which are detailed in this BBC article on removal of the software.

    To save you clicking through, the directories that you have to remove in addition to the application are:

    • /Users/[your user name]/Library/Preferences/BBCiPlayerDesktop.61DB7A798358575D6A969CCD73DDBBD723A6DA9D.1/
    • /Users/[your user name]/Library/Application Support/Adobe/AIR/ELS/BBCiPlayerDesktop.61DB7A798358575D6A969CCD73DDBBD723A6DA9D.1/

    Perhaps I'm still a Mac newbie, but when I delete an application by dragging it to the trash, I expect all traces of it to go (aside from saved data files in my Documents folder). I've no idea whether this is an iPlayer thing or an Adobe AIR thing, but I would not be at all surprised to discover the latter.

    Mar 03

    Understanding imports and PYTHONPATH

    An understanding of PYTHONPATH is key when developing new Python modules, or installing third-party packages and eggs. This article gives an overview of PYTHONPATH and the way Python imports modules.

    Something I've heard a few times from developers coming to Python from languages such as PHP is that module importing and the PYTHONPATH is a bit of a mystery. I remember understanding PYTHONPATH when I learned Python since I'd done a bit of Java at university (and PYTHONPATH is conceptually the same as Java's CLASSPATH), but several flavours of import confused me. This post covers both; first we'll talk about the import statement, and then we'll cover PYTHONPATH.

    Understanding import and from ... import ...

    Python has two forms of import statement. They look something like this:

    import z3c.form.form
    from z3c.form import form

    Python is all about binding (or assigning) names to values, and the primary purpose of the import statement is to bind names to modules. The key difference between the two forms above is what names are made available. The first form lets you reference 'z3c.form.form' in your code; the latter lets you reference 'form' directly. Let's examine the first case:

    >>> import z3c.form.form
    >>> z3c.form.form
    <module 'z3c.form.form' from '/eggs/z3c.form-1.9.0-py2.5.egg/z3c/form/form.pyc'>
    >>> form
    Traceback (most recent call last):
      File "<console>", line 1, in <module>
    NameError: name 'form' is not defined

    Contrast this with the second case:

    >>> from z3c.form import form
    >>> z3c.form.form
    Traceback (most recent call last):
      File "<console>", line 1, in <module>
    NameError: name 'z3c' is not defined
    >>> form
    <module 'z3c.form.form' from '/eggs/z3c.form-1.9.0-py2.5.egg/z3c/form/form.pyc'>

    The only difference between the two cases is the name which is available after the import. (OK, that's a bit of a lie, but we'll gloss over that for now). The rule of thumb is that you will always refer to the bit after the 'import' in following code; so, when you say 'import z3c.form.form' you'll be able to refer to 'z3c.form.form', and when you say 'from z3c.form import form' you'll simply be able to refer to 'form'. In both cases, what you are actually dealing with when you actually use that name (be that 'z3c.form.form' or just 'form') is exactly the same - in this case, the 'form.pyc' module from the z3c.form package.

    PYTHONPATH

    So how does PYTHONPATH fit into this?

    PYTHONPATH is an environment variable, much like PATH. You can get a list of environment variables on UNIX-like operating systems by running the 'env' command. It's available in the Properties of My Computer in Windows. PYTHONPATH is similar to PATH in another way, in that it defines a search path. However, unlike PATH (which tells the operating system which directories to look for executable files in), PYTHONPATH is used by the Python interpreter to find out where to look for modules to import.

    This is probably best demonstrated with an example. Let's create a file called hello.py in a directory ~/pymodules:

    hornet:~ dan$ mkdir pymodules
    hornet:~ dan$ cd pymodules/
    hornet:pymodules dan$ emacs hello.py
    hornet:pymodules dan$ cat hello.py
    def print_hello():
        print 'hello!'

    Now, as my current directory is the pymodules directory, I can fire up Python, import my hello module, and run print_hello():

    hornet:pymodules dan$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13)
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import hello
    >>> hello.print_hello()
    hello!

    However, this doesn't work if I'm not in that pymodules directory:

    hornet:pymodules dan$ cd
    hornet:~ dan$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13)
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import hello
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: No module named hello

    To fix this, we need to tell Python to look in our new pymodules directory for libraries. We do this by setting the PYTHONPATH variable:

    hornet:~ dan$ export PYTHONPATH=$PYTHONPATH:/Users/dan/pymodules 
    hornet:~ dan$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13)
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import hello
    >>> hello.print_hello()
    hello!

    The magic is in that 'export' line, which is appending the path '/Users/dan/pymodules' to the environment variable PYTHONPATH. Note how we append, to avoid completely overwriting any existing values.

    Special cases and examining the search path

    There are a few of 'special cases' to be aware of when thinking about where Python modules may be imported from. The first is that the Python installation's site-packages directory will always be placed on the search path automatically. Secondly, as we saw in the first 'hello' example, the current module's directory is placed on the search path, allowing relative imports (more on this shortly). Finally, the current directory is also placed on the search path.

    This begs the obvious question: how can you definitively find out where Python is looking for modules? Well, like this:

    hornet:~ dan$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13)
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> from pprint import pprint as pp
    >>> pp(sys.path)
    ['',
     '/Library/Python/2.5/site-packages/virtualenv-1.0-py2.5.egg',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python25.zip',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/plat-darwin',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/plat-mac',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-tk',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload',
     '/Library/Python/2.5/site-packages',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/PyObjC']

    That's using a my MacBook's system Python in a fresh terminal. As you can see, there's quite a lot of items in there! (Note also the use of pprint, the 'pretty printer' library to format the output nicely). Note that the first item is the empty list (the current directory) - this is what made our very first 'import hello' work, as noted before. Also notice that the site-packages directory has been placed on the search path. Finally, there's a bunch of items on there that Apple set up. You can also see that I have the virtualenv package installed in my system Python.

    Contrast this with the list after we've set our PYTHONPATH manually as before:

    hornet:~ dan$ export PYTHONPATH=$PYTHONPATH:/Users/dan/pymodules
    hornet:~ dan$ cd /tmp
    hornet:tmp dan$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13)
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> from pprint import pprint as pp
    >>> pp(sys.path)
    ['',
     '/Library/Python/2.5/site-packages/virtualenv-1.0-py2.5.egg',
     '/private/tmp',
     '/Users/dan/pymodules',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python25.zip',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/plat-darwin',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/plat-mac',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-tk',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload',
     '/Library/Python/2.5/site-packages',
     '/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/PyObjC']

    The two important things to note are:

    • My changed current working directory (/tmp) has been added to the search path
    • /Users/dan/pymodules has been added to the path, having been read from the PYTHONPATH environment variable.

    You can dynamically add and remove paths from sys.path to change where Python looks for modules on-the-fly. It's not really recommended, however, unless you're specifically writing package management software.

    Relative Imports

    There's one final type of import that we haven't considered yet: the relative import. Consider this:

    hornet:~ dan$ cd pymodules/
    hornet:pymodules dan$ mkdir p
    hornet:pymodules dan$ touch p/__init__.py
    hornet:pymodules dan$ touch p/m1.py
    hornet:pymodules dan$ touch p/m2.py
    hornet:pymodules dan$ emacs p/m1.py
    hornet:pymodules dan$ cat p/m1.py
    import m2
    hornet:pymodules dan$

    What's happened here?

    Well, we've gone back to our pymodules directory (which is now on the PYTHONPATH, so we can import things directly from it). We've created a package called 'p' by creating a directory of that name and adding an __init__.py, making that directory importable. We've then created two modules in that 'p' package, called m1 and m2. Now - look at the following session:

    hornet:pymodules dan$ cd /tmp
    hornet:tmp dan$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13)
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import m1
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: No module named m1
    >>> import m2
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: No module named m2
    >>> import p.m1
    >>>

    Firstly, we demonstrate that just typing 'import m1' and 'import m2' don't work. This is expected: they're not on the PYTHONPATH. 'import p.m1' does work, though; again, this is expected, since p's parent directory, pymodules, is on the PYTHONPATH.

    But hold on a minute - let's take another look at m1.py:

    hornet:pymodules dan$ cat p/m1.py
    import m2

    What's going on here? We just showed that 'import m2' by itself doesn't work, right?

    Well, almost. Python allows relative imports. Basically, you can say 'import m2' from inside m1.py because m1.py and m2.py are in the same package (same directory). This looks like a handy feature - if you use relative imports everywhere, then you can freely move your modules around and their relative imports will keep working. However, there is a danger here: if I were to create a file called 'os.py' in that pymodules directory, it would mask the system Python's 'os' module when I try to import it. The problem is one of transparency: from looking at the import line, you can't tell whether an import is relative or absolute.

    It's worth noting that this behaviour has changed in Python 3: you now say 'import .m2' to do a relative import of the m2 module. To put it another way, imports are always absolute unless you specifically make them relative.

    Modules are only imported once

    The final thing to note is that modules are only imported once, the first time that they're used. You can generally forget about this fact, though remember that some modules (particularly in the Zope 2 world) have code which runs upon import. Let's modify hello.py to demonstrate this:

    hornet:pymodules dan$ cat hello.py
    print 'Importing hello'

    def print_hello():
        print 'hello!'
    hornet:pymodules dan$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13)
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import hello
    Importing hello
    >>> import hello
    >>>

    Even though we imported the hello module twice, the code at the module level was only executed once.

    Managing PYTHONPATH

    Not too long ago, that was pretty much all there was to know. To use a third-party package, you'd download it and either install it in your Python's site-packages directory (as that's already on the PYTHONPATH), or you'd create a new directory for it to live in, and add that directory to your PYTHONPATH in a wrapper script, or in your ~/.bash_profile. As we've demonstrated, this still works just fine.

    The problem with that approach, however, is that it affects every Python program you run. They'll all get the same (probably large) PYTHONPATH, and the chance of them accidentally importing code from an unexpected location rises. Unfun debugging sessions ensue.

    A better approach these days is to use virtualenv. As discussed in previous articles, a virtualenv isolates gives you a place to install packages without interfering with the main Python install; refer to those articles for more information on installing and using virtualenv.

    That about wraps up this look at imports and PYTHONPATH. As ever, more information on all this can be found in the official Python documentation.


    Jan 19

    Vista hanging on crcdisk.sys

    Badly seated memory found with smoking gun (or, memtest86 is your friend).

    So - after an extended period of it having been sat in our conservatory, with temperatures ranging from -5C to +35C, I booted up my venerable Vista PC. It hung.

    Damn.

    Tried safe mode, and the boot started - but hung when it tried to load a file called crcdisk.sys. Googling revealed likely hardware problems, but nothing really specific - some people found disabling the built-in wifi helped. This PC doesn't have built-in wifi.

    One thing that did crop up was that all the problems relating to this file appeared after the application of a Windows Update patch from Microsoft; and sure enough, pretty much the last thing I did last time the PC was on was run Windows Update. So I did the natural thing, cursed Microsoft again and downloaded the latest Ubuntu.

    Except that didn't boot either.

    This was getting annoying. I wanted to use the PC as a development server, and there it was, an electricity-sucking paperweight.

    However - one thing that the Ubuntu installer does have is a menu option to run memtest86. This is a program that, as the name suggest, tests your memory. I ran that, and the screen immediately turned to a sea of red. My thoughts returned to the somewhat extreme temperature variations, and how likely that repeated expansion and shrinkage was to unseat some memory.

    memtest86 is now happily running as we speak with no failures, now I've removed and re-seated all the memory. So I owe Microsoft an apology - this time, I don't think it was Vista.

    I might still install Ubuntu on it though.

    Dec 02

    fez.djangoskel: Django projects and apps as eggs

    I've made an initial release of fez.djangoskel, which provides simple paster templates for egg-based Django projects and applications.

    This is just a brief note to say that I've release fez.djangoskel, a package which provides paster templates for creating egg-based Django projects and reusable applications. This is all part of my crusade to get the Django community to package software as eggs. As well as source and binary egg releases on PyPI, the code is available on GitHub (representing my first foray into git).

    The easiest way to get this is, as usual, with easy_install. (See my previous posts on how to set this up):

    easy_install fez.djangoskel

    Once that and its dependencies have installed, you should fine that paster can now create Django projects and apps:

    $ paster create --list-templates
    Available templates:
      basic_package:   A basic setuptools-enabled package
      django_app:      Template for a basic Django reusable application
      django_project:  Template for a Django project
      paste_deploy:    A web application deployed through paste.deploy

    It's standard paste from here on in: to create a project, use:

    paster create -t django_project

    To create an app, use:

    paster create -t django_app

    Paster will then ask you a bunch of questions (the most crucial one being the name of your app/project!) and generate the file layout for you.

    I plan to do another release later this week which includes additional templates for creating Django eggs using namespace packages, as well as improved documentation.

    Nov 27

    Blob support in the ZODB with ZEO

    There's not a lot of documentation out there about ZODB blobs and how you configure a buildout with blob support in both ZEO and non-ZEO configurations for a pure Zope 3 application.

    There's not a huge amount of information out there yet on the practicalities of configuring Zope (particularly Zope 3) to use blobs, and then actually using them in an application. I hope the following goes towards remedying that.

    (I also hope the information I give below is correct; all I can say is that it works for me!)

    Blobs without ZEO

    The following buildout.cfg snippet will generate a Zope instance for you with blob support but without ZEO:

    [buildout]
    parts =
      zodb
      instance

    [zodb]
    recipe = zc.recipe.egg:script
    eggs = ZODB3

    [app]
    recipe = zc.zope3recipes:app
    base-site.zcml = <...snip site.zcml>

    [instance]
    recipe = zc.zope3recipes:instance
    application = app
    zope.conf =
        ${database:zconfig}
        devmode on
         <server>
           type HTTP
           address 127.0.0.1:8080
         </server>

    [database]
    recipe = zc.recipe.filestorage
    blob-dir = ${buildout:parts-directory}/database/blobs

    Blobs with ZEO

    Same again, but with ZEO, reusing the [zodb] and [app] sections. Note that you have to specify blob directories for the ZEO clients, as well as the server. I suspect that clients require a local area to stick blobs to before they're streamed over to the ZEO server.

    [zopectl]
    recipe = zc.zope3recipes:instance
    application = app
    devmode = off
    zope.conf =
       <zodb>
         <zeoclient>
           server 127.0.0.1:7100
         </zeoclient>
       </zodb>  

    [instance1]
    recipe = zc.zope3recipes:instance
    extends = zopectl
    address = 127.0.0.1:7080
    zope.conf =
       <zodb>
         <zeoclient>
           server 127.0.0.1:7100
           cache-size 1000MB
           blob-dir /var/instances/blobs/instance1
         </zeoclient>
       </zodb>

    [zeoserver]
    recipe = zc.zodbrecipes:server
    zeo.conf =
       <zeo>
          address 127.0.0.1:7100
       </zeo>
       <blobstorage 1>
        blob-dir /var/server/data/blobs
         <filestorage 1>
            path /var/server/data/Data.fs
         </filestorage>
       </blobstorage>

    Blob support in the application


    It then wasn't immediately obvious how to actually use blobs. In the end I started using z3c.blobfile. I defined a custom widget to wrap image file uploads in an instance of Image that supported blobs, and plugged that into a formlib custom_widget. The below snippet is actually collected from a number of files, but condensed here into one so you can see all the moving parts:


    import zope
    from zope.app.form.browser.textwidgets import FileWidget
    from zope.formlib.form import FormBase
    from zope.schema import Object
    from z3c.blobfile.image import Image


    class ImageWidget(FileWidget):
        def _toFieldValue(self, input):
            value = super(ImageWidget, self)._toFieldValue(input)
            return Image(value)


    class MyForm(FormBase):
    form_fields = Fields(IMyInterface)
        form_fields['image'].custom_widget = ImageWidget


    class IMemberInfo(Interface):
        image = Object(
            title=_(u"schema-image-title"),
            description=_(u"schema-image-description"),
            required=False,
            schema=zope.app.file.interfaces.IImage
        )

    I'm not yet sure how this affects maintenance, such as backups - whether a regular repozo backup also catches the blobs, or if they need to be backed up separately. I'm also not yet sure how the ZEO client blob directories work; hopefully they're just used as transient storage for a blob after they've been (say) uploaded from a web browser and attached to the ZODB locally, but before being streamed over to the ZEO server. There may or may not be housekeeping activities associated with these directories, too.

    If you know the answers to any of those questions, please do post in the comments!

    Anyway - all that represents about an evening's worth of research, so I hope it saves you some time.

    Nov 12

    A Django Development Environment with zc.buildout

    This article will show you how to create a repeatable Django development environment from scratch using zc.buildout.

    Setting up environments is a pain. Whether it's Django, Zope, ASP.NET, whatever - a typical web stack has often dozens of components with dependencies on each other and underlying libraries. How do you manage this? How do you make sure that the software you're running on your development environment is configured the same way, and is the same version that gets into your production environment? How do you make sure that the third-party Python library you've just started using is correctly deployed?

    One answer is zc.buildout. Buildout is a tool for reliably creating reproducible software builds. It was originally developed by Zope Corporation, and is often used in Zope builds; however, there's no dependency on Zope. You can use it to build pretty much anything. And I'm going to show you how to get a Django build up and running using it.

    I shall use PostgreSQL as the database in my examples, but there's nothing stopping you using MySQL or any other Django-supported database, if you wish.

    You'll also need the standard development tools (gcc, etc.) available since we're going to be getting buildout to compile some binary eggs for us.


    The Basics: Python and PostgreSQL

    The only thing you need to get going is some version of Python installed. The system Python is probably fine, as long as it's version 2.3 or later. Get a database installed too: I'm using PostgreSQL here. (You can use buildout to install a database too; I'm not going to cover that here, though, since most people have their database installed through system packages.)

    We are going to install two packages in the system python: setuptools and virtualenv.

    (If you don't want to touch the system Python at all, that's fine; check out my earlier article on how to compile and install a local version of Python. You might want to do this if your system only offers an old version of Python. I do it as a matter of course, but then get shouted at by system admins who want to use a package manager to keep their Python up to date. Your mileage may vary.)

    Download ez_setup.py, and run it as a user who can write to the Python's site-packages directory (root if you're using your system python):

    wget http://peak.telecommunity.com/dist/ez_setup.py
    python ez_setup.py

    Let's also take the opportunity to create a database for the project. This is for PostgreSQL; obviously, substitute whatever's appropriate for your platform:

    createdb djangodevdb 


    virtualenv

    Before we get stuck into buildout, let's talk about virtualenv.

    You're probably going to have more than one project on the go. You want to keep them separate from each other, and want to avoid polluting your system Python installation with application-specific third-party modules. This is what virtualenv does. It lets you create isolated sandboxes: modules installed in one virtualenv don't interfere with other virtualenv. Let's install virtualenv and create an environment for our Django environment:

    easy_install virtualenv

    That will download and install virtualenv. Next up, let's create ourselves a sandbox called 'djangodev' to work in:

    hornet:2.4 dan$ virtualenv --no-site-packages djangodev
    New python executable in djangodev/bin/python
    Installing setuptools.............done.
    hornet:2.4 dan$

    Finally, we need to 'activate' the sandbox. This isolates the environment from the system Python, and ensures that any modules installed are local to this environment.

    hornet:2.4 dan$ cd djangodev/
    hornet:djangodev dan$ source bin/activate
    (djangodev)hornet:djangodev dan$

    Note how the command prompt changes as a visual indicator that we're now working in a virtualenv. You can type 'deactivate' if you want to exit the virtualenv.


    Initial Configuration

    Getting going with buildout is very straightforward. Create a top-level directory to hold your application and download the bootstrap.py file into it:

    mkdir app
    cd app
    wget http://svn.zope.org/*checkout*/zc.buildout/trunk/bootstrap/bootstrap.py

    Don't run this yet.


    A Basic Buildout Configuration

    Next, create a file in that same directory called buildout.cfg, and put the following into it:

    [buildout]
    parts =

    This is pretty much the simplest buildout configuration you can start with.


    Bootstrapping

    The very first time you use buildout, you have to bootstrap it. This installs buildout itself, and generates scripts to run the buildout. Bootstrapping the buildout is simply a matter of running the bootstrap.py file you downloaded earlier. You should see output resembling this:

    (djangodev)hornet:app dan$ python bootstrap.py 
    Creating directory '/Users/dan/opt/virtual/2.4/djangodev/app/bin'.
    Creating directory '/Users/dan/opt/virtual/2.4/djangodev/app/parts'.
    Creating directory '/Users/dan/opt/virtual/2.4/djangodev/app/develop-eggs'.
    Generated script '/Users/dan/opt/virtual/2.4/djangodev/app/bin/buildout'.
    (djangodev)hornet:app dan$

    That just creates buildout's initial scripts and directory layouts. You don't have to run it again for this environment.


    Installing Django

    So, all that was boring. And it probably seemed fiddly: after all, couldn't we just have installed Django in the virtualenv's site-packages manually? Yes, we could; but then we'd have had to do that every time we deployed an environment. We can now start to use buildout to automate our builds. Let's install Django 1.0 with buildout.

    Open up your buildout.cfg again, and change it so that it looks like this:

    [buildout]
    parts = django

    [django]
    recipe = djangorecipe
    version = 1.0

    Now, go ahead and run buildout. This will download the Django 1.0 distribution, so may take a few minutes depending on your connection:

    (djangodev)hornet:app dan$ bin/buildout 
    Unused options for buildout: 'download-directory'.
    Installing django.
    django: Downloading Django from: http://www.djangoproject.com/download/%s/tarball/
    Generated script '/Users/dan/opt/virtual/2.4/djangodev/app/bin/django'.
    (djangodev)hornet:app dan$

    Buildout (via the 'djangorecipe' extension) has done a couple of things for us:

    • It has created a script called bin/django to run the django management commands
    • It has created an inital Django project (called, imaginitively, 'project') for us with some default settings
    • It has installed Django

    Note that buildout created a script called 'django' in the bin directory. This script it the exact equivalent of django-admin.py, or running python manage.py when manually installing Django; that is, you can run bin/django syncdb, bin/django sqlall, eveything you would expect. So let's go ahead and try to run the Django development server:

    (djangodev)hornet:app dan$ bin/django runserver
    Traceback (most recent call last):
    File "bin/django", line 20, in ?
    djangorecipe.manage.main('project.development')
    File "/Users/dan/.buildout/eggs/djangorecipe-0.13-py2.4.egg/djangorecipe/manage.py",
    line 15, in main
    management.execute_manager(mod)
    [ ... snip ... ]
    File "/Users/dan/opt/virtual/2.4/djangodev/app/parts/django/django/db/
    backends/sqlite3/base.py", line 26, in ?
    raise ImproperlyConfigured, "Error loading %s module: %s" % (module, e)
    django.core.exceptions.ImproperlyConfigured: Error loading pysqlite2 module:
    No module named pysqlite2

    Well - that didn't go so well!

    The problem here of course is that we've installed Django, but we haven't specified the database to connect to, or installed the python module required to connect to the database. Let's do both now.


    Installing Dependencies

    Configuring Django to connect to a database is well covered in the Django documentation, so for now I'll just tell you to edit the project/settings.py file and change DATABASE_ENGINE to 'postgresql_psycopg2', and to set your DATABASE_NAME, DATABASE_USER etc. appropriately for your installation.

    Installing the connector is more interesting. The connector for PostgreSQL is called psycopg2. Let's tell buildout to install that. Open up your buildout.cfg again, and change it so it now looks like this:

    [buildout]
    parts = django

    [django]
    recipe = djangorecipe
    version = 1.0
    eggs = psycopg2

    All we've done is add psycopg2 to the list of eggs to install with Django.

    Now just rerun buildout:

    (djangodev)hornet:app dan$ bin/buildout 
    Uninstalling django.
    Unused options for buildout: 'download-directory'.
    Installing django.
    Getting distribution for 'psycopg2'.
    warning: no files found matching '*.html' under directory 'doc'
    /Users/dan/opt/python-2.4.5/include/python2.4/datetime.h:186:
    warning: 'PyDateTimeAPI' defined but not used
    psycopg/typecast.c:37: warning: 'skip_until_space' defined but
    not used
    /Users/dan/opt/python-2.4.5/include/python2.4/datetime.h:186: warning:
    'PyDateTimeAPI' defined but not used
    ./psycopg/config.h:63: warning: 'Dprintf' defined but not used
    ./psycopg/config.h:63: warning: 'Dprintf' defined but not used
    /Users/dan/opt/python-2.4.5/include/python2.4/datetime.h:186: warning:
    'PyDateTimeAPI' defined but not used
    zip_safe flag not set; analyzing archive contents...
    Got psycopg2 2.0.8.
    Generated script '/Users/dan/opt/virtual/2.4/djangodev/app/bin/django'.
    django: Skipping creating of project: project since it exists

    As you can see, buildout noticed that we'd specified an extra requirement of psycopg2 and so downloaded it from PyPI, compiled and installed it for us. What buildout did is essentially analagous to running 'easy_install psycopg2'. Now we should be able to run the Django development server:

    (djangodev)hornet:app dan$ bin/django runserver
    Validating models...
    0 errors found

    Django version 1.0-final-SVN-unknown, using settings 'project.development'
    Development server is running at http://127.0.0.1:8000/
    Quit the server with CONTROL-C.

    It worked! You can now go ahead and check your buildout.cfg into source control. Anyone checking that out will get the same Django build as you.


    PIL

    PIL, the Python Imaging Library, is always a bit of a pain to install. Django requires it if you want to work with images. It's also not packaged with setuptools, let alone as an egg. How can we get this into our build?

    Fortunately, Chris McDonough has repackaged PIL with setuptools, making it relatively straightforward to add to our build. Open up buildout.cfg again, and edit it so that is looks like this:


    [buildout]
    parts =
    PIL
    django

    [django]
    recipe = djangorecipe
    version = 1.0
    eggs =
    psycopg2
    markdown
    PIL

    [PIL]
    recipe = zc.recipe.egg
    egg = PIL==1.1.6
    find-links = http://dist.repoze.org/

    Rerun buildout, as before, and PIL should be downloaded, compiled and installed.

    That should be enough to get you going with buildout. You can find lots more on buildout and the Django recipe in the links below:

    Google found me those links, so I'm sure it can find more for you too! I'd particularly encourage you to read the djangorecipe documentation for more detail on how it can configure Django for you.

    In future articles, I intend to talk about

    • Starting Django applications as eggs
    • Configuring applications as development eggs inside buildout
    • Packaging applications and uploading them to PyPI, so that they're just an easy_install away
    • Dealing with third-party Django applications which have not been packaged as eggs
    • Using buildout to build non-python dependencies
    • How all this works with source control

    Until next time.

    Oct 12

    Modifying a Solaris SMF service

    How to modify properties of a Solaris SMF service.

    A customer of mine has an OpenSolaris machine, meaning I've just had my first exposure to Solaris from a system administration perspective. The Solaris documentation seems fairly complete, but is lacking in concrete examples. Listed below are some notes  for working with Solaris services (mainly for my own benefit).

    I used the following cheatsheet heavily; I won't reproduce it here.

    OpenSolaris Cheat Sheet

    Service identifiers

    Services in Solaris are identified by what the documentation refers to as FMRI: Fault Mangement Resource Idenitifiers. These are used throughout SMF, Solaris' Service Management Framework, for referring to services. Apache, for example is referred to as svc:/network/http. Services can be listed using the svcs command.

    Services and Instances

    There can be several instances of a given service; usually, however, you just use the default one (usually called default). An exception to this (on the OpenSolaris machine I have access to, at least) is Apache; the instance name is actually 'apache', so the full service is called svc:/network/http:apache. Postfix is more conventional, with an instance name of svc:/network/postfix:default.

    The important thing to note is that properties can be set both at the service and instance level.

    Setting Properties

    Much of a service's behaviour can be configured by setting properties on the service. The key one mentioned above, the command line to start the service, is called 'start/exec'. The name is in two parts: 'start' is a property group, and 'exec' is a property name.

    The following command will set the command to execute to start a service:

     svccfg -s svc:/virtualmin/mysite_com/Trac setprop \
    start/exec = astring: \"/opt/local/bin/tracd\ -d\ \
    --hostname=localhost -p\ 8000\ /home/trac/trac/project\"

    Here we're setting the service up to run the standalone trac daemon. Note the usual shell escape characters. It's possible to run svcconfig interactively - if you do so, all the escaping of course isn't required. Also note that in this case, the property is set on the service directly, not the instance.

    Once the property is set, you then need to refresh and restart the service:

    svcadm refresh virtualmin/mysite_com/Trac
    svcadm restart virtualmin/mysite_com/Trac

    More to follow...

    I'll use this blog post as a notepad for managing Solaris services.



    Jul 29

    Testing App Views

    How to easily test views in applications which don't have a urls.py file.

    First of all, thanks to those people who've been reading and pointing out improvements in the comments - it's much appreciated, and I've learned a lot from you!

    Django applications tend to have views, in a views.py file. These are generally looked up by URL. URLConfs (various urls.py files) are used to map request URLs to views.

    This can lead to a problem from a testing perspective. Not all applications have urls.py files; and for those that do, there's nothing stopping a project wiring up different URLs to those views through a custom urls.py. It therefore becomes difficult to use Django's test client to invoke a view using a GET or a POST because you don't know how that URL has been configured. After all, you run python manage.py test from your project, and the project's configuration is used.

    Let's say I have a views.py that looks like this:

    from django.shortcuts import render_to_response
    from django.template import RequestContext

    def say_hi(request):
        return render_to_response('app/hi.html', context=RequestContext(request))

    I want to use Django's test client to ensure that when this view is called, it returns an HTTP 200 OK response. However, my app doesn't have a urls.py - the person using it is meant to wire this view into whatever URL they want to. How do I test the view? 

    The solution is deceptively simple: inject a test URLs module.

    Create a base test case that looks something like this:

    from django.conf import settings
    from django.test import TestCase

    class TestUrlsTestCase(TestCase):

        def setUp(self):
            self._old_root_urlconf = settings.ROOT_URLCONF
            settings.ROOT_URLCONF = 'app.testurls'

        def tearDown(self):
            self.ROOT_URLCONF = self._old_root_urlconf

    Remember that setUp() is run just before each test, and tearDown() is run just after. What we're doing is replacing the URLConf just for the duration of the test. We have to put it back in the tearDown(), else we'll end up with test state 'leakage' into other tests.

    You then just have to add a testurls.py file to the 'app' application (the one I'm testing) which contains known URLs  for the views I want to invoke. For example, my testurls.py might look like this:

    from django.conf.urls.defaults import *

    urlpatterns = patterns('app.views',
        (r'^foo/',  'say_hi'),
    )

    I can then write standard test client code to check that my views are working as expected:

    class RealTest(TestUrlsTestCase):
                   
        def testGet(self):
            response = self.client.get('/foo/')
            self.assertEqual(200, response.status_code)

    It now doesn't matter how the project integrator has configured their urls.py. When the tests are run, the URLConf that I have specified will always be used.


    Jul 01

    Django Unit Tests and Transactions

    While these are more properly integration tests than unit tests, it can be handy to have Django roll back the database transaction after each test method runs.

    Coming to automated testing in Django from the Zope and Plone world, I was pleased to find full support for all the testing machinery that I've become used to: regular Python unit tests, and doctests. Of course, these being unit tests, they don't do any 'framework' management out of the box.

    Unit tests are supposed to test your code, and just your code. However, once you're in a framework environment (be that Zope and Plone, Django, or anything else) then testing how your code integrates with that framework is vital. Zope and Plone provide unittest.TestCase subclasses (ZopeTestCase and PloneTestCase respectively) which provide a lot of scaffolding for you to be able to run integration tests. Part of that scaffolding is automatic transaction management. This hooks into Zope's transaction API to roll back the transaction after each test runs.

    I wanted to do something similar for my Django test cases; I was finding 'state pollution' between my unit test runs, since data created by one test method isn't automatically cleaned out.

    Django's transaction handling is much simpler than Zope's: it cares only about the one database transaction that the current request has, and only if the transaction support middleware is installed. This means that we can pretty easily crib the code from that middleware and use it in a test case base class:

    from django.db import transaction

    class TransactionalTestCase(unittest.TestCase):

    def setUp(self):
    super(TransactionalTestCase, self).setUp()

    transaction.enter_transaction_management()
    transaction.managed(True)

    def tearDown(self):
    super(TransactionalTestCase, self).tearDown()

    if transaction.is_dirty():
    transaction.rollback()
    transaction.leave_transaction_management()

    UPDATE: Fixed an error in the call to the base class' tearDown() method, which caused open transactions to hang around and (among other things) prevented the test database being cleanly dropped at the end of the test run.

    After this, you can simply derive your test fixture classes from TransactionalTestCase, and make sure that you call the base setUp() and tearDown() methods if you do need to override them to perform your own setup and teardown.

    My next spare time (hah!) project will be to integrate Django's transaction management into repoze.tm (which is Zope's transaction management suitably WSGI-fied). This would let a Django application participate in transactions with other transaction-aware components, making integration at the WSGI layer much more straightforward.

    Jun 05

    Erlang Bus Error

    I've become interested in CouchDB, written in Erlang. My copy of the 'Programming Erlang' book arrived today, so I tried to fire up the Erlang shell - only to be greeted with a Bus Error.

    (Thanks to Jan Lehnardt on the couchdb-user mailing list for apparently being psychic and posting a solution just as I tried to run Erlang.)

    Yes, it's that big, bad old Leopard 10.5.3 update at it again. As well as breaking my Time Machine over AirDisk, it broke my Erlang shell.

    I've got Erlang installed using MacPorts, so fortunately the solution was as simple as:

    sudo port uninstall erlang
    sudo port install erlang +universal

    Erlang, back in business. Now all I have to do is learn it! (Good thing I did Haskell at uni - never thought I'd be saying that...)

    I'll keep you updated how my Adventures in Erlang go. I may even have to add a new Erlang category to the blog.

    Jun 03

    VMWare Fusion, Snapshots and Disk Space

    Snapshotting your VMWare Fusion machine can start eating up disk space. This is not widely known.

    My 40GB Windows XP VM had mysteriously grown to 50GB. I couldn't quite figure it out: 40GB disk, 1.5GB RAM, what more could it want to store?

    Answer: I'd taken a VM snapshot prior to applying XP SP3.

    Conceptually, a VMWare snapshot is a point-in-time image of your VM. However, you'll notice that taking a snapshot doesn't double the amount of disk space that your VM takes up. What actually appears to happen is the VMWare starts appending changes you make to your VM to a new 'differences' file within the VM package on disk, leaving your original VM file intact. If you ever revert to that snapshot, it can simply throw away this file containing the changes.


    This also means, that as you change the contents of your VM, it will take more and more disk space as VMWare builds this 'differences' file. The solution to this is to discard the snapshot: select Discard Snapshot from the Virtual Machine menu. Be aware though that this operation can take a long time. VMWare has to go through the differences file and apply them to the original image. If a lot of data has changed, this will take a while. However, once the snapshot has been discarded, your VM will shrink back to its expected size.


    May 29

    Faster, Apple Mail, Faster!

    A quick way to improve the performance of Apple Mail.

    Leopard's incarnation of Mail.app is mostly lovely. However, when you load it up with tens of thousands of mail messages, it can get a little slow. The usual solution - Rebuild, from the Mailbox menu - wasn't doing it for me.

    However, I found this gem of a tip which I wanted to link to in order to improve its Google rank - it took me too long to find it!

    It goes without saying that you should back your data up before trying this.

    In essence, however:

    Shut down Mail

    Open Terminal, and enter the following:

    hornet:~ dan$ cd ~/Library/Mail
    hornet:Mail dan$ sqlite3 Envelope\ Index

    You'll then see the SQLite prompt appear. Enter 'vacuum subjects;' and press enter:

    SQLite version 3.4.0
    Enter ".help" for instructions
    sqlite> vacuum subjects;

    You'll then have to wait a bit - don't panic, this is normal.

    What's happening is that the SQLite database engine (used by Mail.app behind the scenes) is cleaning up data fragmentation and empty data pages within the database file itself. Doing this reduces the amount of disk activity required to read the database, improving performance.

    Once you get your sqlite prompt back, simply quit:

    sqlite> .quit

    Fire up Mail.app again, and you should notice a significant speed improvement. Sweet.

    May 13

    How To Kill an AirPort Extreme Base Station

    Apple products aren't infallible, you know.

    Here's how to kill a base station so badly that it needs power cycling:

    1. Create a few big Time Machine backups on a USB disk attached to your Mac.
    2. Unmount the backup and hang the drive off the back of your AirPort Extreme.
    3. Configure the AirPort Extreme to share the disk as an AirDisk
    4. Mount up the new AirDisk on your Mac. Note how you can browse the old backups. (They won't work as an AirDisk Time Machine backup though, those are sparse disk images.)
    5. In a terminal window, su to root and go to the Backups.backupdb directory.
    6. rm -rf <machinename>, to try to remove the old Time Machine backup.
    7. Boom!

    At this point, my Mac gets disconnected from the wireless network. A subsequent attempt to reconnect times out, and the base station then disappears completely. Yanking the power cord is the only way to fix it.

    Don't do this to other peoples' base stations, it's mean.

    (Hm - wonder if it's accessible if I attach via Ethernet? Might have to give that a go, in the spirit of inquiry...)


    May 07

    Creating a Python 2.4, Plone and Zope Development Environment on Mac OS X Leopard

    Compiling Python, Zope and Plone on Leopard isn't as easy as it is on Linux. Here's a walkthrough of the process, from a bare Leopard install right through to having a working Plone 3 development environment, using paster and buildout.

    UPDATED 27/01/2008: Added instructions on building PIL with buildout

    If you're new to Zope and Plone on the Mac, and just want to get up and running, stop reading right now! Instead, go and download the unified Plone installer for Mac OS X, Linux and Solaris on Plone.org. This article is for people who need more control over the installation, and need to build the core pieces from source.

    Still reading? Good.

    Note that the information here has been drawn together from a number of places on the web. However, it's not been presented as an end-to-end process. That's what I aim to do here.

    Install development tools

    Development tools (compilers and headers, basically) aren't installed by default on Leopard. Dig out your Mac OS X DVD, and install Xcode. This will give you everything you need. You should be able to do something like this:

    $ gcc
    i686-apple-darwin9-gcc-4.0.1: no input files

    You can see that the gcc compiler is installed. If instead you see 'command not found', then you don't have gcc installed.


    Compiling Python

    Python is the first hurdle. Leopard ships with Python 2.5, but Zope 2 still requires 2.4. Unfortunately, Python 2.4 doesn't compile out of the box with Leopard. You'll see something like this:

    hornet:Python-2.4.5 dan$ ./configure --prefix=/Users/dan/tmp/opt
    [snip lots of output]

    hornet:Python-2.4.5 dan$ make
    [compile, compile, explode]
    gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -DNDEBUG -g -O3
    -Wall -Wstrict-prototypes -I. -I./Include  -DPy_BUILD_CORE  -c ./Modules/signalmodule.c
    -o Modules/signalmodule.o
    gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -DNDEBUG -g -O3
    -Wall -Wstrict-prototypes -I. -I./Include  -DPy_BUILD_CORE  -c ./Modules/posixmodule.c
    -o Modules/posixmodule.o
    ./Modules/posixmodule.c: In function 'posix_setpgrp':
    ./Modules/posixmodule.c:3145: error: too few arguments to function 'setpgrp'
    make: *** [Modules/posixmodule.o] Error 1

    Ouch.

    Fortunately the solution for this is actually on the Python.org site. (If you scroll down a bit, which I didn't initially). Simply modify your configure line to include an extra define:

    ./configure MACOSX_DEPLOYMENT_TARGET=10.5

    You should fine Python now compiles. You can then make install.

    Oh - but the fun isn't over yet! Have a play with your new Python. At best, you'll find your arrow keys don't work. At worst, you'll experience odd crashes with the helpful generic Bus Error message.

    Fortunately, Andreas Jung has the answer here: a broken or outdated version of the readline library. You can compile your own, or do what Andreas recommends and install one from DarwinPorts. Once that's installed, you simply need to do:

    $ sudo port -d selfupdate
    $ sudo port install readline

    You should now have a working readline. Next up, you need to configure your Python build environment to pick this up (otherwise it'll just pick up the broken OS X readline again.) I actually did this the way Andreas describes, a manual hack in the Makefile:

    CC = gcc -I/opt/local/include -L/opt/local/lib

    I suspect that you'd be able to pass --includedir=/opt/local/include and --libdir=/opt/local/lib to the configure script, and that would be a cleaner way of doing it. Once you've down that, run make and make install, and you should have a working Python, with working readline support, that doesn't crash.

    Remember to set your PATH up to use your new Python binary in preference to the system one - something like:

    export PATH=/Users/dan/opt/bin:$PATH

    What's next?

    setuptools

    A lot of stuff these days is installed using setuptools. Download ez_setup.py to somewhere convenient, and run it using your new python:

    python ez_setup.py

    This will download and install setuptools into your Python 2.4's site-packages directory, and configure it so that it's usable.

    To check it's installed correctly, type:

    which easy_install

    The path should be within your Python 2.4 install. If it's not then you've managed to install it into your system Python. First - why are you running as root; and second, check your PATH.

    Note that from now on we're going to be using a package called virtualenv to isolate any further package installs. You've gone to a lot of effort to get a working 2.4 install - let's keep it tidy!

    virtualenv

    virtualenv is a fantastic tool which creates a Python 'sandbox' to work in. This means that you can install packages in a virtualenv to your heart's content, and it won't pollute the main Python installation. It creates a new python binary, and lib/python24/site-packages structure to facilitate this.

    Installing it is dead easy:

    easy_install virtualenv

    virtualenv will be downloaded and installed. Let's immediately create a virtualenv to work in:

    $ mkdir -p ~/virtual/2.4
    $ cd ~/virtual/2.4
    $ virtualenv plone3
    New python executable in plone3/bin/python
    Installing setuptools............done.
    $ cd plone3
    $ source bin/activate
    (plone3)$

    Note how your prompt has changed - this is a visual clue that you're now working in a virtual environment. Look at the output for:

    which python

    You should see that the Python from the virtualenv is used. This of course means that any further easy_installs you do will be into thsis virtualenv.

    As above, I find it useful to put my virtualenvs for Python 2.4 and Python 2.5 in separate directories. But that's just me - running source bin/activate in each virtualenv will do the right thing.

    ZopeSkel

    ZopeSkel is a package which will install paster, buildout, and some commands to help create Plone development instances (amongst other things, including Silva instances, Plone themes, etc.) Installing it is easy:

     (plone3)$ easy_install ZopeSkel

    This will whir for a while. Once it's installed, you should find that you have a paster installed:

    which paster

    ... will show you that it has, as with everything else, been installed into your virtualenv.

    Creating a Plone 3 development environment

    We're on the final leg of the journey now!

    paster and buildout do most of the heavy lifting for us here. First of all, we use paster to create a buildout for us:

    paster create -t plone3_buildout p3

    This, again, will whir for a while. plone3_buildout refers to the template to be used to create p3. If you're interested, you can see what other templates were installed:

    paster create --list-templates

    Of course, paster create --help will give you more information on available options.

    PIL

    Before we run our buildout, we need to install the Python Imaging Library, also known as PIL. Previous versions of this article described how to download and install this from source. However, there is now an easier way.

    Edit the p3/buildout.cfg file and edit it as follows:

    [buildout]

    parts =
      PIL
      ....

    eggs =
      ...
      PIL

    [PIL]
    recipe = zc.recipe.egg
    egg = PIL==1.1.6
    find-links = http://dist.repoze.org/


    Building Zope and Plone

    Now we have our buildout, we're ready to go. Let's finally build Zope out, configured for Plone. During this process, you'll be asked for an initial administrative username and password for Zope:

    (plone3)$ cd p3
    (plone3)$ python bootstrap.py
    (plone3)$ bin/buildout

    Now go and make a cup of tea - that last step can take a while.

    What's happening is that buildout is using the buildout.cfg to locate and download all the eggs and parts which make up a Zope 2 and Plone 3 installation. It will then compile and install those for you. If you run buildout with multiple -v options (eg. -vvvv) then it'll give you a lot more information about what it's doing.

    When this finishes, you should be able to run up your Zope instance:

    (plone3)$ bin/instance fg

    Browse to http://localhost:8080 (assuming you kept the port the same) and you should see the familiar Zope Quick Start page.

    Creating a development egg

    It's likely that all new development that you'll be doing in Plone 3 will be egg-based. However, it's not immediately obvious how you get started.

    Development eggs live in the src/ directory of your buildout. As with the main Plone 3 buildout, we use paster to create the egg for us:

    (plone3)$ cd src/
    (plone3)$ paster create -t plone egg.name

    You'll get asked some questions. The namespace and package you should set appropriate for your project (I'll assume you answered 'egg' and 'name' respectively); and say 'no' to the Zope 2 product and zip-safe questions. Paster will then go ahead and create the file structure for your egg in src.

    Finally (and I mean it this time) we have to tell buildout about the development egg. Go back up one level of directory, above src, and you should see a file called buildout.cfg. Edit this file, and add the following:

    [buildout]
    eggs =
        egg.name
    ...
    develop =
        src/egg.name
    ...
    [instance]
    zcml =
        egg.name

    Once this is saved, then run buildout again in offline mode:

    bin/buildout -o

    This will look at your buildout.cfg and modify the buildout to include your development egg.

    And after that?

    If you've got to here, well done! You've now got a development environment. The final stage, once you've developed your egg, is to package it. This is pure setuptools. From within your egg:

    python setup.py sdist bdist_egg

    This will create a finished egg, placed in the dist/ directory, ready to use. It will also create a source distribution (ending in .tar.gz).

    There's a lot more to learn around this process - but I hope that's clarified the basic procedure, and workarounds for some of the issues specific to Mac OS X Leopard.

    Apr 15

    import this.NET

    The Zen of Python doesn't just apply to Python.

    PEP 20 is one of the most important documents that you read as a Python programmer. For the lazy, here's the meat:

    Beautiful is better than ugly. 
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced.
    In the face of ambiguity, refuse the temptation to guess.
    There should be one-- and preferably only one --obvious way
    to do it.
    Although that way may not be obvious at first unless you're Dutch.
    Now is better than never.
    Although never is often better than *right* now.
    If the implementation is hard to explain, it's a bad idea.
    If the implementation is easy to explain, it may be a good idea.
    Namespaces are one honking great idea -- let's do more of those!

    The interesting thing about this PEP, of course, is that it doesn't really apply to Python. It applies to the programmer. Therefore, it applies equally well to most software development; be it in Python, C#, SQL, or whatever. Whenever I find myself faced by a design decision where I'm unsure of the best path, applying the guidelines above usually helps me decide.

    Try it yourself. You'll be surprised how the above cuts through a lot of design cruft.

    (Having said that, I'm not sure the one about being Dutch really applies to .NET.)


    Apr 14

    Shared Development Databases

    One of the big culture shocks moving to a .NET development team was the amount of shared development infrastructure - particularly on the database side. In this post, I share some of my experiences and suggest ways that you can work around some of the impositions of shared infrastructure.

    First of all - long time no postee! Sorry about that. As well as starting to arrange my own wedding, I've been helping a couple of friends out with some interview questions and another ASP.NET application. All this adds up to very little time left over for blogging.

    Back to your scheduled transmission.

    I've been wearing the Enterprise .NET/SQL Server/Oracle developer hat for a little over a year now, and it still isn't very comfortable. One of the particularly uncomfortable aspects is the shared development database that we have to use.

    Yes, that's right - everyone develops on a single instance of Oracle or SQL Server. People are making structural and procedural changes all the time as we develop the software. Chaos ensues.

    Well - actually, less chaos than you would expect. We have some carefully orchestrated dances which minimise the impact of this restriction. The given reasons for this restriction are of varying levels of convincingness (new word made up today - check):

    • Centralised access control
    • Centralised maintenance and patching

    I can understand those arguments. However, for me, the massive inconvenience and danger of losing work significantly outweighs the benefits (especially considering the access control procedures are laughable - but that's a story for another day).

    The rhyme and reason aside, what do you do?

    Make sure you have scripts for everything

    If you're not physically sharing a database, then you need to be able to share changes. If you don't have this, take a snapshot of your current schema now and check it into source control.

    You should then add a table to your database to track schema version (optionally add fields to track who made changes and when, if you like).

    Finally, only make changes to the database schema via scripts. These scripts should update the database version table once they've run; they should probably sanity check that the database is at the expected version before running. If you use an appropriate naming convention (perhaps <version>.sql) then it becomes pretty easy to automate running these.


    Get Your Own Database Schema

    Even if you're stuck on shared kit and database instance, getting your own database schema (or even better, permission to create your own schemas) is the next best thing. You should be using the schema scripts to communicate change. Having an automated database build is really critical here. Wiring that into a continuous integration process (or at the very least, a nightly integration run) will also help you spot any conflicts.

    This works best if you're using a branch-for-feature (or branch-for bug) strategy in source control. That way, when your piece of work is done, you drop your development database, rebuild it from trunk code, then merge your scripts into trunk and run them. This merge step gives you the opportunity to reconcile any conflicts.

    Of course - it always helps to talk to your colleagues! Even with the best branching and merging strategy in the world, conflicts are going to be difficult to resolve if two of you have created a column with the same name but different types.

    What next?

    Unfortunately, the above doesn't seem to be possible at The Day Job. We have to share schemas. We can't each have our own because disk is expensive.

    (*cough*)

    So we all share a single development schema (for each application), on a single database instance.

    If you're like this, then you're in 'cooperative multitasking' territory - that is, you have to agree with the rest of the team that, at all times, you work on different parts of the schema and different stored procedures. (It's even slightly worse on Oracle - tools tend to steer you towards scripting whole packages, so the 'atomic unit of work' tends to be much larger on Oracle than SQL Server.)

    You can (just about) make this work. It's painful and error-prone though - and one misstep will blow everyone's work away at once.

    Mar 07

    Django Redirects (Updated)

    This is a quick-n-dirty app to let you set up redirects on your Django site through the admin interface. (Updated to reflect my own oversight of django.contrib.redirects, which, er, does all this for you...)

    Update: A reader has kindly pointed out to me that django.contrib.redirects already exists, which does what I describe below! D'oh. I'm sure that wasn't there when I last looked. Don't use my code - use the Django one, of course. Consider this... another talked-through example of middleware. And a lesson in reading before you write.


    There are loads of ways to tell your audience that their content has HTTP 301: Moved Permanently. A line in your Apache/lighttpd/whatever configuration perhaps; or you could even wire in Django's redirect_to generic view in your urls.py file. These are the recommended approaches: directly configuring your web server is likely to be the most performant solution (as requests will never touch your app server cluster), and the urls.py method will even let you pick apart the incoming URL and plug arguments into the URL to be redirected to. All good stuff.

    Sometimes, however, you need to get a redirect up NOW. Perhaps a high-profile site has linked to an article of yours, but they've messed up the link, and it's 2am and you're in the UK and all your techs who would normally do this for you are in bed (or the pub) and all that potential advertising revenue is bouncing off your 404 page.

    Here's a tiny app (based heavily on flatpages) that lets you quickly and easily add a redirect to your site. You can download it from the Software section. Here's how you install it:

    1. Unpack the file and drop it somewhere in your PYTHONPATH, so Django can import it.
    2. Edit your settings file, and add 'stereoplex.flatredirects' to your INSTALLED_APPS
    3. While you're add it, add 'stereoplex.flatredirects.middleware.FlatredirectFallbackMiddleware' to your MIDDLEWARE_CLASSES
    4. Resync your database, and you should be good to go.

    It's pretty simple to use. You should see the 'Flatredirects' application appear in your admin interface, and it'll let you add Redirect objects.

    Your 'from url' should be the location which is currently 404'ing (ie. the incoming, broken link). It works in the same way as flatpages - so it's relative to the root of your site, and it should have a leading and trailing slash.

    The 'to url' is sent back to the browser - it can be relative, or absolute.

    Once you've saved that, try it - you should find that going to yoursite.com/redirectfrom (where redirectfrom is what you put in the 'from url' box) will now redirect you appropriately.


    How does it work?

    Flatredirects uses a very simple Django middleware (not to be confused with WSGI middleware) to do its work. I've talked about Django middleware in a previous post.  In that example, I implemented a process_request() middleware, since I needed to augment the request before it was passed on to.

    This time, we're dealing with the response, so we provide a process_response() method. Take a look at the files middleware.py and views.py in stereoplex/flatredirects.

    The logic is pretty simple: if the response is going to be a 404, then hand processing off to the flatredirect view. This view looks the URL up in the database, and uses Django's own redirect_to generic view to perform the redirection if there is a match. If there isn't, 404 is raised once more back to the middleware; and in this case, the middleware returns the original 404.


    Is this suitable for doing all your redirects?

    This product makes adding redirects very easy. However, there are some caveats to this approach:

    • In this implementation at least, there's no ability to dynamically generate the redirect URL.
    • More importantly, you should consider the performance implications of all your potential 404 accesses causing a database hit.

    So it's probably OK as a quick band-aid if nobody's around to fiddle with your web server configuration - but don't blame me if your database melts!

    Note that this is an alpha release, but feel free to let me know of any bugs, or features that you'd like to see.

    Get it at:

    http://www.stereoplex.com/software/flatredirects

    Feb 25

    Using output parameters with the Enterprise Library

    Trying to get stored procedure output parameters working when you're using the Enterprise Library 3 can be frustrating. Here's a couple of tips to help you along.

    It's fairly common to return a IDataReader from a stored procedure call to iterate through the results set. It gives the underlying data source the opportunity to return results lazily, and can potentially avoid your application having a whole data set in memory at once. This is clearly useful if you're dealing with very large data sets.

    Typically, code to get one of these data readers looks something like this:

    using (DbCommand cmd = db.GetStoredProcCommand("s_sProcName"))  
    {
    db.AddInParameter(cmd, "@p_sFoo", DbType.String, "Bar");
    }
    return db.ExecuteReader(cmd);

    This works as you'd expect - the stored procedure s_sProcName executes, and the method returns an IDataReader from which you can fetch results.

    However, adding an out parameter doesn't work as you'd expect:

    using (DbCommand cmd = db.GetStoredProcCommand("s_sProcName"))  
    {
    db.AddInParameter(cmd, "@p_sFoo", DbType.String, "Bar");
    db.AddOutParameter(cmd, "@po_iOut", DbType.Int32, 4)
    }

    IDataReader reader = db.ExecuteReader(cmd);
    int x = db.GetParameterValue("@po_iOut");
    return reader;

    If you try this, you'll find that x is zero — and in fact, the call to GetParameterValue returned null.

    It seems that all the row data has to already have been returned by the DbCommand object before out parameters from the stored procedure actually become available. To maintain your interface, you therefore have to do something like this to get at your data:

    DataTable table = db.ExecuteDataSet(cmd).Tables[0];
    IDataReader reader = table.CreateDataReader();
    int x = db.GetParameterValue("@po_iOut");
    return reader;

    ExecuteDataSet() fetches all available data from the DbCommand object. Once this has been done, output parameters become available for use. Of course, this approach negates one of the key advantages to using IDataReader at all - that you don't pull all the data into memory at once.

    I'd love to hear from anyone with a better way of doing this. It's occurred to be that multiple result sets might be a way forward, though I don't know if multiple IDataReaders are supported in that scenario.

    I guess I'll just have to try it to find out.

    Feb 13

    Oops.

    Why having automated tests for your web site's functionality is a good idea.

    So I got fed up maintaining a postfix mail server with Spamassassin and ClamAV. Don't get me wrong, these are great pieces of software - but running them doesn't float my boat. So I moved all my mail hosting over to Google.

    Unfortunately, I forgot to reconfigure this blog. So for those of you who have been trying to sign up and getting errors like 'address family not supported by protocol' - sorry, my bad. The site was trying to connect to a non-existent mail server to send you a password. Thanks to those of you who emailed me about the problem.

    For now, I've reconfigured the site to allow you to specify your own password. If comment spam becomes ugly though, I'll have to turn it off again - and actually get on and rewrite this blog on something less weighty than Plone, which will also allow me to have you able to add comments without having to create an account. I hate it when sites make me do that, and it irks me that this software makes you guys do that.

    So for now, you should be able to join and add comments as you'd expect.

    Oops.

    Feb 12

    ASP.NET AJAX and Unknown Web Method

    The ASP.NET AJAX toolkit includes controls, such as the DynamicPopulateExtender, which invoke methods on the pages on which they are included to obtain data. This post looks at a common problem and its under-documentated solution.

    When you work with open source projects, you get spoiled: sometimes there's good documentation, usually there's a good community, and you always get access to the code. Rarely is life so good when using the proprietary Microsoft stack.

    The DynamicPopulateExtender AJAX control (demonstration) is a control which lets you populate a region of the page with a string (probably HTML) in response to another element being clicked. I'm trying to use it to create a control similar to the autocomplete text boxes that one tends to come across these days, only I'd like it to be rendered as a drop-down list - I don't want to post back arbitrary text.

    The ListSearch is almost there; but I don't want to always download a 100,000-row dataset to my page to render. I would like the search to take place as I type, and for the server to populate the dropdown on the client with results in real time. I'm hoping that the DynamicPopulateExtender will allow me to squirt a list out to the client dynamically.

    The aspx page will look something like this:

    <asp:ScriptManager 
    runat="server"
    ID="_scriptManager"
    />

    <ajaxToolkit:DynamicPopulateExtender
    TargetControlID="_updateTarget"
    runat="server"
    ID="_searchExtender"
    PopulateTriggerControlID="_searchBox" 
    ServiceMethod="Foo"
    ServicePath="TestTA.aspx"
    />

    <asp:TextBox
    ID="_searchBox"
    runat="server"
    />               
    <asp:Panel ID="_updateTarget" runat="server">
        <em>Enter search phrase...</em>
    </asp:Panel>

    Pretty simple. The idea here is that when you click on the _searchBox TextBox, the browser invokes the 'Foo' method on the TestTA.aspx page. This method returns a string which is plugged right into the _updateTarget panel.

    According to the docs (and all the other blogs that are out there) your code behind should contain something like this:

    using System.Web.Services;
    using System.Web.Script.Services;

    namespace Test
    {
        public partial class TestTA : System.Web.UI.Page
        {
            [WebMethod]
            [ScriptMethod]
            public string Foo(string contextKey)
            {
                return "Hello world!";
            }

        }
    }

    Again - no surprises there. Note that you need to decorate your method with both [WebMethod] and [ScriptMethod].

    Except - it doesn't quite work. When you click the element on the page, you'll find an HTTP 500 error returned. Digging into your logs, you'll find something like this:

    Exception information: 
        Exception type: ArgumentException
        Exception message: Unknown web method Foo.
    Parameter name: methodName

    The key missing piece is that the Foo method must be marked as static. This wasn't mentioned explicitly in any how-tos that I found (although the code on an MSDN blog post includes it).

    Your final code-behind will therefore look something like this:


    using System.Web.Services;
    using System.Web.Script.Services;

    namespace Test
    {
        public partial class TestTA : System.Web.UI.Page
        {
            [WebMethod]
            [ScriptMethod]
            public static string Foo(string contextKey)
            {
                return "Hello world!";
            }

        }
    }


    I hope that stops someone else tearing their hair out.

    Feb 11

    djangopeople.net, GeoDjango and PostGIS

    Michael Trier's "This Week In Django" podcast is turning into something of a gem mine. I'm particularly taken with one website he mentioned, djangopeople.net. Plus, thoughts on GeoDjango and PostGIS, and the iPhone's Wi-Fi Locations technology.

    I don't know what it is about geographical data, but applications that use it are seriously cool.

    Mentioned on Michael Trier's "This Week In Django" podcast, (also available - free - through the iTunes store), djangopeople.net is a great way to find out who's doing Django stuff near you.

    Thankfully, it's not trying to be yet another social networking site. It's simply a Google Maps mashup letting you view basic details about people, what they do, and where they're based.

    So I want to know where you are - if I'm doing Django work, and I need extra manpower, well, it sure helps if I can see you're nearby!

    PostGIS and GeoDjango

    Also mentioned on Michael's podcast are the GeoDjango branch and the PostGIS project that it's based on. PostGIS puts a number of spatial datatypes into PostgreSQL, and GeoDjango builds onto that. Given the current explosion in mobile technology, I would venture that this branch of Django is going to be the basis of some exciting, spatially-aware applications in the future.

    I'm definitely going to be checking this branch out, and perhaps even downloading the iPhone SDK when it arrives. With  Jobs' announcement of iPhone/iPod Touch support for Wi-Fi locations, everything's starting to come together nicely. Exciting times!

    Feb 05

    Error compiling PostgreSQL 8.3 on Leopard: 'rl_completion_matches'

    This is a brief note for those having trouble compiling the latest version of PostgreSQL on Leopard.

    I'm currently setting up my development environment on my new Mac, and have got round to the database bit. As you probably know, PostgreSQL 8.3 was released a day or so ago, so I decided to grab that and compile it up.

    All initially went well - I'd already installed MacPorts to get libxml2 and libxslt. I'm compiling PostgreSQL by hand (I'll need multiple installs of different versions for working on older software). I issued the configure:

    ./configure --prefix=/Users/dan/opt --with-python --with-libxml --with-libxslt

    Then make - which failed with the following message:

    gcc -no-cpp-precomp -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline 
    -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -I.
    -I../../../src/interfaces/libpq -I../../../src/bin/pg_dump
    -I../../../src/include -I/opt/local/include/libxml2   -c -o tab-complete.o
    tab-complete.c
    tab-complete.c: In function 'psql_completion':
    tab-complete.c:601: warning: implicit declaration of function 'rl_completion_matches'
    tab-complete.c:601: warning: assignment makes pointer from integer without a cast

    It seems that PostgreSQL doesn't get on with the GNU readline library that's installed. Fortunately, you can ask configure to prefer the BSD readline library:

    ./configure --prefix=/Users/dan/opt --with-python \
    --with-libxml --with-libxslt --with-libedit-preferred

    Rebuilding after this worked like a charm.

    Feb 05

    Mac OS X command and manual search path (PATH and MANPATH for man)

    Mac OS X Leopard provides a simple way to manage your system-wide PATH and MANPATH for command and man pages.

    On traditional UNIX and Linux, setting the PATH and MANPATH is usually done on a per-shell basis. For example, bash (the default Linux shell) usually has a sensible default configured in /etc/profile. This is often extended or overridden in each user's own .bash_profile (or .bashrc).

    Mac OS X Leopard (I'm not sure if Tiger did this) provides a slightly different way to manage system-wide search paths. The /etc/profile is still there, but looks like this:

    # System-wide .profile for sh(1)

    if [ -x /usr/libexec/path_helper ]; then
            eval `/usr/libexec/path_helper -s`
    fi

    if [ "${BASH-no}" != "no" ]; then
            [ -r /etc/bashrc ] && . /etc/bashrc
    fi

    Note how the path_helper program is invoked.

    This program does something very simple: it looks through the contents of /etc/paths.d (or /etc/manpaths.d for man pages) and prints a string to stdout containing all the paths in those files. For example, my /etc/paths.d looks like this:

    hornet:~ root# ls -l /etc/paths.d/
    total 16
    -rw-r--r--  1 root  wheel  13 Sep 24 04:53 X11
    -rw-r--r--  1 root  wheel  15 Feb  5 12:36 macports
    hornet:paths.d root# cat X11
    /usr/X11/bin
    hornet:paths.d root# cat macports
    /opt/local/bin

    Running the paths_helper command yields (sorry for the long lines):

    hornet:paths.d root# /usr/libexec/path_helper -s
    PATH="/bin:/usr/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/opt/local/bin"; export PATH
    MANPATH="/usr/share/man:/usr/local/share/man:/usr/X11/man"; export MANPATH

    You can see that paths_helper has appended the contents of the files to the PATH and MANPATH.

    So - to add paths to the global search path in Leopard, just drop a new file containing your path in /etc/paths.d.

    See the man page for path_helper(8) for full details.

    Jan 01

    Django ModelForm and newforms

    Browsing the Django code after a recent svn up shows that newforms.form_for_instance and friends are deprecated and that you should use a ModelForm instead. This post gives a brief example of how to do this.

    Happy new year, everyone! I spent it huddled up with a cold and lots of water and hot lemon. I hope you had a better time than I did.

    To business. I noticed after a recent svn up that newforms.form_for_instance and friends are deprectated. The comment instructs you to use ModelClass instead. Unfortunately, at the time of writing, there aren't any documents on how to do this. I therefore thought it would be useful to share a brief example with you.

    This isn't meant to be comprehensive; it should, however, serve as an example to get you going. I'll also post it over on djangosnippets.org which has a bit more exposure than this blog.

    The Problem

    What form_for_instance and form_for_model did for you was to generate a form based on a Django instance or model. This saves a lot of messing around coding your own forms and keeping them in sync with your models. Doing all that manually clearly violates Django's DRY principle. For complex forms, you're still going to need to do this yourself; but if you're just creating add/edit forms for model instances then form_for_* are useful.

    These helper functions have now been replaced with a unified approach using ModelClass. ModelClass lives in django.newforms.model.

    ModelClass: The Theory

    To use ModelClass, you basically do the following:

    • Create a model as you normally would
    • Create a subclass of Django's ModelClass with Meta inner class indicating the underlying model
    • Instantiate the form in view code

    It's all pretty straightforward - once the form has been created, data validation and cleaning work in exactly the same way as regular newforms.

    An Example

    Let's go through an example. First off, we start with a model. This would probably live in your models.py for your application, as normal.

    class Project(models.Model):
        title = models.CharField(max_length=50)
        created_on = models.DateTimeField(auto_now_add=True)
        description = models.TextField(max_length=5000)

        def __unicode__(self):
            return self.title

        class Admin:
            pass

    Hopefully nothing should be too scary there.

    I then created a views/forms.py module, and place the following code in there:

    from pm.models import Project
    from django import newforms as forms

    class ProjectForm(forms.ModelForm):
           
        class Meta:
            model=Project

    Once again, very simple. The only thing to note is the specification of the model to use as the basis of the form in the Meta class. This follows the pattern used in regular model classes pretty closely.

    That's all the code there is to it on the forms side - on to the view. That is, once again, very simple:

    from django.contrib.auth.decorators import permission_required
    from django.core.urlresolvers import reverse
    from django.http import HttpResponseRedirect
    from django.shortcuts import render_to_response
    from pm.forms import ProjectForm

    @permission_required('pm.add_project')
    def project_add(request):
        project = Project()
        if request.POST:
            form = ProjectForm(data=request.POST, instance=project)
            if form.is_valid():
                form.save()
                return HttpResponseRedirect(reverse(project_detail, args=(project.id,)))
            else:
                request.user.message_set.create(message='Please check your data.')
        else:
            form = ProjectForm(instance=project)

        context = section(request, 'projects')
        context['form'] = form
        return render_to_response('templates/pm/project_add.html', RequestContext(request, context))

    If you've done any newforms programming, the above should look very familiar to you. In fact, there's nothing different here than what you're used to. The only real difference is the use of the instance keyword argument to pass in a pre-created instance. It's not absolutely necessary to pass request.POST as a keyword argument, it'll work as the first positional parameter.

    The template simply uses {{ form.as_p }} to render the finished form.

    Note I've also used the permission_required decorator. Always check security on your views - don't ever rely on someone just 'not knowing' your URLs.

    Note also that this code refers to something called project_detail - this is just another view. You can read about the fantastic reverse() function on the B-List.

    There's more that you can do with this - I recommend that you check out the newforms documentation and the Django sources themselves for more information. However, I hope that the above is enough to get you started with simple ModelForm forms.


    Dec 28

    'unicode' object has no attribute 'get_sql'

    I keep getting this error, and it always takes a moment of head-scratching to figure out what's wrong. Here's what's wrong, to save you some hair.

    Another tiny snippet for you.

    You fire up your Django app, only to be confronted with:

    AttributeError at /highwire/projects/1/tasks/add/
    'unicode' object has no attribute 'get_sql'

    You've probably got a line like this:

    foo = get_object_or_404(Foo, foo_id)

    get_object_or_404() actually requires key/value pairs after the first class argument, in the same way the .filter() method does.

    The code above should actually read:

    foo = get_object_or_404(Foo, pk=foo_id)

    It's the simple things that eat the most time.

    Dec 28

    Adapters in Django and the Revenge of Zope

    Using Zope 3's component architecture can significantly reduce dependencies between applications within a Django project. Why stand on the shoulders of one giant, when you can stand on the shoulders of two?

    Adapters are great.

    They're so cool, I even wrote a .NET adapter registry called Adaptation, which I hope will come in useful on a forthcoming top-secret project (of which I have a couple in the works).

    Simple - up to now

    Up until now I haven't had to do anything particularly complex in Django. It's all been small-scale, fairly limited, and - crucially - not really designed to be reusable. Now I'm starting to develop Django applications that I want to reuse, and am running across age-old problems: how to write apps to be reusable, and how to reuse other Python code that either wasn't really written with reuse in mind by not offering the right extension points.

    django.contrib.contenttypes

    One part of the puzzle is provided by Django's contenttypes module (django.contrib.contenttypes). In a nutshell, this lets you declare 'fake' foreign keys in your models to other models that you don't yet know about. For example, one of my reusable applications is a small workflow component. I can't know in advance what Django models I'm going to be workflowing; however, the contenttypes' GenericForeignKey field lets me abstract that away. (More detail on this package is for another day; it's woefully underdocumented.)

    Actions in Django

    The second piece of the puzzle is best illustrated with an example.

    I have a bunch of models in an application called 'pm'. These are Task, Milestone, and other project-management related entities. I want to display them on the web, and I want each one to have some context-sensitive tabs - such as 'Overview', 'Detail', etc. Each of these tabs has a title and a URL.

    (If you're familiar with the Zope 2 CMF then yes, I'm basically talking about a simple form of actions.)

    Now - where do I put these definitions?

    I want them to be declarative in Python code, so I don't have a lot of ridiculous "if content type is foo then display these tabs" in template code.

    The first place that occurs to me to put them is therefore in the model itself - just have a list of dicts. Models would look something like this:

    class Task:
    actions = [
    { 'title': 'Overview',
    'url': reverse(taskoverview)
    }
    ]

    title = models.CharField(max_length=10)
    ... etc ...

    I don't really like this approach.

    • These actions are really UI-specific, and don't belong in the data model definition
    • We really need the URL to be context-specific - in this case, I probably need some form of Task ID in the 'url' value.

    Direct Attribute Setting

    Another option could be to simply set attributes on the model classes from outside. So in some module-level web code, have something like this:

    from pm.models import Task
    Task.actions = [{'title': 'Overview', ... etc ...

    This solves the problem of having UI code in the model; however, it still suffers from the same problem of not having an instance to compute variable URLs. Direct attribute setting also feels somewhat 'icky'.

    Zope 3 Adapters

    Zope 3 is not an application server in the same way as Zope 2 is - it's more like a set of libraries. And this means that you can take parts of Zope 3 and use them in other applications.

    In this case, we're going to take zope.component and zope.interface, and use them in our Django application.

    Installing the eggs

    Installation is easy.

    easy_install zope.interface
    easy_install zope.component

    Defining interfaces

    First of all, we define what we expect something that can have actions to provide in terms of an interface:

    from zope import interface
    from zope import component
    class IActionProvider(interface.Interface):
    """
    Interface specifying that this object can provide actions.

    The actions attribute is an iterable of instances of the
    Action class.
    """
    actions = interface.Attribute("Actions")

    Next, we define a marker interface for our Task, and declare that our task model implements it. We'll also need a similar interface for Django's HttpRequest - the reason for that will become clear shortly:

    from pm.models import Task
    from django.http import HttpRequest

    class ITask(interface.Interface): pass
    class IHttpRequest(interface.Interface): pass

    interface.classImplements(Task, ITask)
    interface.classImplements(HttpRequest, IHttpRequest)

    Note that we're not doing this in the models code - this need only happen in our UI code, that requires this actions functionality.

    The last piece is to define an adapter. The adapter provides the  bridge between a Task instance, and something that provides the 'action' attribute specified in the IActionProvider interface. The adapter looks like this:

    from django.core.urlresolvers import reverse
    from web.views import project import task_details, task_milestones

    class TaskActions(object):
            
        interface.implements(IActionProvider)
        component.adapts(ITask, IHttpRequest)
            
        def __init__(self, context, request):
            self.context = context
            self.request = request
            
        def _getActions(self):
            actions = []
            request = self.request
            actions.append(Action(request, 'Overview', task_detail, self.context.id))
            actions.append(Action(request, 'Milestones', task_milestones, self.context.id))
        
            return actions
        actions = property(_getActions)

    component.provideAdapter(ProjectActions)

    Note how the adapter also takes the request. This is to allow us to use the request to generate a URL for the task that we're currently looking at.

    The call to component.provideAdapter() makes the adapter available to the component architecture.

    This adapter also refers to an Action class. This is a very simple data structure:

    class Action(object):
        def __init__(self, request, title, view, *args, **kwargs):
            self.title = title
            self.url =  reverse(view, args=args, kwargs=kwargs)  
            self.selected = request.META['PATH_INFO'] == self.url

    Using the Adapter

    Finally, we define a template tag to display the actions:

    from django.template import Library
    register = Library()

    @register.inclusion_tag('templatetags/actions.html', takes_context=True)
    def actions(context):
        obj_name = context.get('obj_name', 'object')
        obj = context.get(obj_name)
        request = context['request']
        provider = component.getMultiAdapter((obj, request), IActionProvider)
        return {'actions': provider.actions}

    After some boilerplate to pull some items (notable the request) out of the context, the key two lines are the two final lines in the function. This performs an adapter lookup to obtain the adapter that we defined, and finally accesses the actions attribute that actually computes the actions for the context object. This is then plugged into the context for the template.

    The template itself simply iterates over the Action instances:

    <ul class="object-actions">
      {% for action in actions %}
      <li>
    <a href="{{action.url}}"
    {%if action.selected%}
    class="selected"
    {%endif%}>
    {{action.title}}
    </a>
    </li>
      {% endfor %}
    </ul>

    That seems a lot of bother

    Well - it is overkill for a single model. However - adding support for actions to any model (or any other Python class) is now as simple as defining an appropriate adapter providing IActionsProvider for that new type - and it will plug cleanly in to this infrastructure.

    Judicious use of interfaces and adapters can make incorporating third-party code (or even your own code) extremely simple. In particular, it avoids the need to modify the code you're integrating.

    Further Reading

    I've glossed over a lot of details here about the ins and outs of the Zope 3 component architecture. If you want to find out more, read the README.txt files in zope.component and zope.interface (which are comprehensive, if somewhat terse) or alternatively buy Philipp von Weitershausen's excellent Component Development with Zope 3. Make sure you pick up the second edition.

    Dec 16

    GET and POST handling in Django views

    Django view functions often feature an initial stanza near the start in order to handle POST data differently. You often end up with quite a lot of logic in one method. Here I describe a cleaner way, using a decorated view class to replace the single function.

    It's not uncommon for a Django view to look something like this:

    def view(request, object_id):
    ob = get_object_or_404(Foo, object_id)
    if (request.POST):
    # do some processing of the POST
    raise new HttpResponseRedirect('/')

    # do standard view processing
    return render_to_response('template')

    With more complex view logic, this can get pretty messy.

    How about if you could write something like this:

    class FooView(BaseView):
    @get
    def getFoo(self, request, object_id):
    return object_detail(request, queryset=Foo.objects,
    object_id=object_id)

    @post
    def editFoo(self, request, object_id):
    # do POST processing
    return HttpResponseRedirect('/')

    This shows how you can still group logically-related operations into a single class, but keeping the individual activities separate.

    Note how you simply indicate which method should be used for each HTTP method using the appropriate decorator.

    The first version of a module to let you do that is below. It works pretty well as it is, but needs some more work to make it work properly with other Django standard decorators like require_POST, permission_required, and so on. Put this in a file called httpmethod.py on your PYTHONPATH:

    _class_view_registry = {}

    class BaseView(object):

    def __new__(cls, *args, **kwargs):
    instance = super(BaseView, cls).__new__(cls)

    # Try to get the view registry for this class from the global view
    # registry. If it's not there, create one - just a dict - then iterate
    # through the class's members looking for objects which have been
    # annotated with an _httpmethod_name attribute. The value of this
    # attribute is the HTTP method name that we want to associate this
    # method with.
    #
    # If the registry is there, then we've processed this class before
    # and so all the mappings should be there.
    view_registry = _class_view_registry.get(cls, None)
    if view_registry is None:
    _class_view_registry[cls] = view_registry = {}

    for name in dir(instance):
    obj = getattr(instance, name)
    httpmethod_name = getattr(obj, '_httpmethod_name', None)
    if httpmethod_name is not None:
    view_registry[httpmethod_name] = obj

    return instance

    def __call__(self, request, *args, **kwargs):
    methodname = request.method.strip().upper()
    method = _class_view_registry.get(self.__class__).get(methodname)
    return method(request, *args, **kwargs)

    def httpmethod(name, func):
    if not name:
    raise ValueError, 'name must be set'

    func._httpmethod_name = name
    return func

    # Decorators for HTTP methods as defined in:
    # http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
    def post(func):
    return httpmethod('POST', func)

    def get(func):
    return httpmethod('GET', func)

    def head(func):
    return httpmethod('HEAD', func)

    def delete(func):
    return httpmethod('DELETE', func)

    def options(func):
    return httpmethod('OPTIONS', func)

    def put(func):
    return httpmethod('PUT', func)

    def trace(func):
    return httpmethod('TRACE', func)

    def connect(func):
    return httpmethod('CONNECT', func)


    To use this, your URLConf lines should look similar to this:

    from project.app.views import FooView

    urlpatterns = patterns('',
    (r'^foo/(\d+)/$', FooView()),
    )

    I'll update the code to be compatible with Django's decorators as I need to - when that's done, I'll also stick it on djangosnippets.org. I've included decorators for all the HTTP methods.

    Until then, enjoy - and as ever, feedback is welcome!

    Dec 11

    TFS, Force Get, and Get Specific Version

    When TFS tells you something, you want to think twice about whether you believe it.

    So - you've all come across this, right?


    All files were downloaded


    Great - all the files in my workspace (workspaces - now there's another dumb idea, but I'll leave that for another day) are up to date.

    Except they're not, because I just deleted them.

    An important thing to realise about Team Foundation Server is that it keeps a lot of state about your checkout server-side. This is in marked contrast to something like Subversion, where the server has no awareness of what the client is up to*, or to Mercurial, Git et al. where you don't even need a server at all.

    This message actually translates to:

    "The server doesn't think you need to get any updated files because nothing that you downloaded last time has changed."

    The clue is in the final sentence in the dialog - you have to use the "yes, I really mean it" option to get the latest versions of your files.

    What does this mean in practice? It means that everyone does force get anyway, all the time. Any performance benefit that might have been realised from having an 'intelligent' source-code-getter are lost, and we're left with more UI cruft.

    Like most things with TFS, this isn't a showstopper, merely slightly annoying. But when you spend the whole day with a thousand 'slightly annoying' things, they all add up to 'very annoying'.

    I hear that TFS 2008 is meant to be better. That's good - TFS 2005 is clearly half-baked. At least it hasn't actually eaten any source code yet, so I'd rather use it than Visual Source Safe (or possibly even CVS).

    Rant over.

    *That's a slight generalisation as you can optionally lock files. But the general case is that you can do what you like client-side.

    Dec 09

    Fez Consulting web site live

    The Fez Consulting web site has gone live - another dirt simple site built on Django.

    Just a quick note that the Fez Consulting web site has gone live. This is really for the benefit of those who want to pay me to do stuff for them on weekends and evenings.

    Not much to see there - it's another Django site, hooked up to Apache using mod_wsgi. I guess the main thing I'm noticing now is that I hardly spend any time with Django itself. Most of the time is spent tweaking HTML and CSS. I don't think the amount of time I spend on presentation has increased, so I'm forced to conclude that Django hacking really is a lot more productive than Zope 2 - it just seems to get out of the way.

    Having said that, I've not yet built what I'd consider a 'production-grade' Django site. The software is not finished until there is high test coverage, and I've simply not done that for the Django sites yet.

    Time will tell.

    (As a footnote, IE sucks. I built the site so it looked great in Firefox on the Mac and on Windows and on Mac Safari - and of course IE makes a dog's dinner of CSS. Sigh. Cue CSS butchering for IE's benefit.)

    Dec 04

    TextMate's Open Recent

    Sometimes it's the little things that make you say "wow".

    Like I said before, I'm a geek.

    I'm a big fan of TextMate. It's my standard text editor when I'm on my Mac, and I miss it when I'm at work using Visual Studio. It's a very unassuming piece of software - I remember wondering what all the fuss was about when I first started out using it.

    Then it gets to you, bit by bit. It's thoughtful little features that do it, such as the Recent Items menu.

    I structure my Django projects in a fairly standard way, which means that I'm often opening different folders with the same name - they'd just have a different parent. In this case, I've opened up two media directories, one under 'mint' and one under 'fez'. Stupid programs would just show you either 'media' twice, or try to squeeze the whole file path into the menu, performing various textual gymnastics to do so.

    TextMate shows you just enough detail to be useful, and no more. A picture speaks a thousand words:

    Screen grab showing TextMate's Open Recent menu

    Screen grab showing TextMate's Open Recent menu

    Dec 04

    Programming with Passion

    Software engineering shouldn't be 9 to 5 job. It should be a source of constant intellectual stimulation, in an environment of constant change. However, experience tells me that there are plenty of people for whom programming is the process by which a specification document is turned into a paycheck.

    It's no secret: I'm a geek.

    I start work at around nine at the Day Job, designing, writing and testing code, and fitting reading research in when I can or need to.

    When I finish for the day (usually around five - work smart, not long) I often launch into some personal project or other. Or I make sure I'm up-to-date with the programming reddit, or some of my favourite technical blogs, or just plain spend time thinking.

    (And no: I don't have children. I guess my schedule would be a little different if I did.)

    The point is that, for me, programming is more than a job - it's a lifestyle, something I think about most of the time, and to a certain extent defines my identity.

    I have found that not everyone is like me.

    There are innumerable programmers (I suspect in the majority, squirrelled away in IT departments in companies for whom IT is non-core) who come in every day, turn a specification into a paycheck, and go home again. Jeff Atwood calls these guys "the 80%".

    We - you, I, anyone else who reads or writes blogs for interest, for whom programming goes beyond the daylight hours - are the 20%, the other part of the Pareto split.

    What does this mean?

    First, I don't think it means that most programmers don't care. I've known plenty of people in the 80% who cared that their software worked, and was delivered on time.

    However, for the 20%, "worked" and "delivered on time" simply aren't good enough. We strive to create software that's faster, smaller, more elegant, and more beautiful. We want it to delight you. Even more, we want it to delight us. We will spend all our waking hours doing this. We love the process where software gets bigger as you write it, but then gets smaller and smaller until it does what it is meant to do and nothing more.

    (Microsoft? No? Thought not.)

    Our software is beautiful.

    Software produced by the 80% is not beautiful. Typically, these are in-house, custom solutions. From the outside, they tend to be ugly, with difficult-to-use user interfaces. Internally, they're invariably sprawling monsters, grown from an initial ill-thought-out requirement that was never questioned, merely implemented. Fix is layered on feature, and performance goes down the tubes.

    Beautiful software is flexible through simplicity and transparency. There is no magic; actions are clear to users, be it a web page, a batch processing system, or a desktop application. This is the design ethic that Apple (mostly) understands, and underpins the way they approach everything from hardware right up the stack to end-user software.

    This is hard. It requires you to think. You need to be able to challenge what you're being asked to do, and to say "no". Beware of the WIBNI, and scope creep.

    But mostly it requires you to be passionate, to care about your craft.

    Imagine what we could do if we got that 80% to be passionate too.


    Nov 20

    How Big Is Yours?

    Lamenting the size of Logitech's mouse driver.

    In these days of ubiquitous broadband (well, the priviliged ubiquiterati, anyway), download file sizes are becoming less of an issue.

    But 60MB for a mouse driver? A mouse driver?

    I guess it'll have to wait until my 2746MB download of SQL Server Developer Edition from MSDN finishes.

    You could almost imagine Intel being right at home as an ISP.



    Nov 07

    Cookieless Django: Sessions and authentication without cookies

    Django, by design, maintains its session state by storing a cookie on the client. This is a good design decision. Sometimes, however, you need to track sessions for clients that don't support cookies.

    Django's session implementation requires cookie support on the browser. It doesn't fall back to putting session IDs in the query string, the way PHP might do. There's a very good reason for this, and it's mentioned at the bottom of the sessions documentation:

    The Django sessions framework is entirely, and solely, cookie-based. It does not fall back to putting session IDs in URLs as a last resort, as PHP does. This is an intentional design decision. Not only does that behavior make URLs ugly, it makes your site vulnerable to session-ID theft via the “Referer” header.

    Note particularly the last sentence - session IDs in URLs leaves your visitors more vulnerable to Flash-based multiple file upload widget. This is pretty easy to do: after including the required Javascript files, you pop one of these in your page:

    var uplooad = new FancyUpload(input, {
    swf: '{{ MEDIA_URL }}/Swiff.Uploader.swf',
    url: 'http://localhost:8000/1/images/upload/'
    /* options */
    });

    (MEDIA_URL is a Django thing).

    Note the upload URL location - this is where the upload widget will POST the files to. This works pretty much as expected.

    The problem comes when I protected the image upload view (using the @permission_required decorator from django.contrib.auth.decorators). Permission checks require authentication; authentication requires sessions; sessions require cookies. And the upload widget doesn't post any cookies, even when posting back to the same domain. I'm not sure whether this is a bug or a security feature. Either way, you end up in the authentication machinery with an Anonymous user. Django then helpfully serves up a 302 redirect to your login page.

    Darn.

    The Solution

    My solution was to write a piece of custom Django middleware. Middleware lets you plug into the Django request/response pipeline, and perform processing before a view is resolved and processed.

    I figured that if I could write a piece of middleware which pulled the session ID out of the query string and put it in the cookies collection before the session and authentication middlewares ran, then I'd be able to get the session machinery to use a URL-based session ID.

    I created a FakeSessionCookie middleware to do exactly that. It's pretty simple.

    from django.conf import settings

    class FakeSessionCookieMiddleware(object):
       
        def process_request(self, request):
            if not request.COOKIES.has_key(settings.SESSION_COOKIE_NAME) \
                and request.GET.has_key(settings.SESSION_COOKIE_NAME):
                request.COOKIES[settings.SESSION_COOKIE_NAME] = \
    request.GET[settings.SESSION_COOKIE_NAME]

    I then just had to edit my MIDDLEWARE_CLASSES settings in my project's settings.py to include a reference to this middleware:

    MIDDLEWARE_CLASSES = (
        'django.middleware.transaction.TransactionMiddleware',
        'django.middleware.common.CommonMiddleware',
        'mint.middleware.FakeSessionCookieMiddleware',
        'django.contrib.sessions.middleware.SessionMiddleware',
        'django.contrib.auth.middleware.AuthenticationMiddleware',
        'django.middleware.doc.XViewMiddleware',
        'django.contrib.flatpages.middleware.FlatpageFallbackMiddleware',
    )


    Middleware can provide a number of methods. One of these is process_request. It can either return an HttpResponse (in which case no further middlewares are processed) or None, causing the next middleware to be processed as normal. The order of middlewares is therefore important: in this case, the FakeSessionCookie middleware has to be placed before the SessionMiddleware. This lets it inject the fake session cookie into the request.COOKIES collection before the session is processed.

    Note that this middleware is conservative: it only injects the query string value into the cookies collection if there isn't already one there.

    I then tweaked the script generated for the upload widget, so it looked like this:

    var uplooad = new FancyUpload(input, {
    swf: '{{ MEDIA_URL }}client-dev/Swiff.Uploader.swf',
      url: '{{ BASE_URL }}{% url mint.propertylisting.views.uploadImage
    listing.id %}/?{{session_cookie_name}}={{session_cookie_value}}'
    });

    (BASE_URL is provided by another custom context processor which injects the base absolute URL part of the full request URL; you may need remove the line break before listing.id, that's just for site formatting; note to self, fix CSS!).

    The final piece of the puzzle is to put the session_cookie_name and session_cookie_values in the context for the view. This was pretty trivial:

    @permission_required('propertylisting.add_propertylistingimage')
    def uploadImages(request, property_id):
      ...
    context = {
            'session_cookie_name': settings.SESSION_COOKIE_NAME,
            'session_cookie_value': request.COOKIES[settings.SESSION_COOKIE_NAME]
        }
       
    return render_to_response('propertylisting/uploadImages.html', \
    RequestContext(request, dict=context))


    That completed the puzzle - I then had an authenticated user in the view handling the image upload.

    Is this approach safe?

    Well - the short answer is 'no'. URLs - and therefore the session IDs - are being transmitted in the clear with the request.

    However, there's little danger of the session ID appearing in some other site's referrer logs. (I won't say no danger, because you never quite know exactly what the client browser will do!). The upload request is coming from the Flash component, not the browser; the browser never 'knows' about the upload URL in any meaningful sense.

    I'd therefore consider this method to be relatively safe in this usage. As ever, you have to use your own judgement.


    Oct 30

    A Django image thumbnail filter

    Django filters are a pretty cool idea. I needed to be able to generate image thumbnails. I found some existing code, and fixed it up to, well, work, and also behave better with custom upload_to locations.

    Django's filters are pretty cool. Django comes with a bunch of useful default filters, letting you do stuff like lowercase text, centre text in a block, and so on. I was quite surprised to find that there wasn't an image thumbnail filter. This seemed like a pretty common requirement.

    Always loth to write code when I don't have to, I set about searching for an existing filter. I came across a post about something else completely that happened to contain such a filter.

    Naturally, it didn't quite work for me in its existing form. In particular:

    • It assumes your thumbnails directory is directly under your MEDIA_ROOT
    • It's hardcoded to use UNIX-style path separators
    • It needlessly recomputes an image URL twice
    • There's a bug so that if a thumbnail already exists, then it won't return any URL at all.

    I've fixed all these problems, and you can find my code below (I'll also post it on djangosnippets.org, since Google hasn't visited my blog for a while now!).

    I'm not 100% happy with the solution: I don't like the way that you have to effectively repeat the upload_to in the parameters that you have already coded in your model. I also don't like the way that multiple parameters are encoded in the single string parameter (although that's more a consequence of the way Django filters work: you can only have zero or one parameters. I guess it prevents you from being tempted to put too much logic in these things).

    Anyway, here's the code. It seems to work for me, let me know if you find any bugs.

    (Apologies if some of the lines are truncated. You should still be able to highlight them for copy and pasting. One day I'll fix the site's CSS...!)

    THUMBNAILS = 'thumbnails'
    SCALE_WIDTH = 'w'
    SCALE_HEIGHT = 'h'

    def scale(max_x, pair):
    x, y = pair
    new_y = (float(max_x) / x) * y
    return (int(max_x), int(new_y))

    # Thumbnail filter based on code from http://batiste.dosimple.ch/blog/2007-05-13-1/
    @register.filter
    def thumbnail(original_image_path, arg):
    if not original_image_path:
    return ''

    if arg.find(','):
    size, upload_path = [a.strip() for a in arg.split(',')]
    else:
    size = arg
    upload_path = ''

    if (size.lower().endswith('h')):
    mode = 'h'
    else:
    mode = 'w'

    # defining the size
    size = size[:-1]
    max_size = int(size.strip())

    # defining the filename and the miniature filename
    basename, format = original_image_path.rsplit('.', 1)
    basename, name = basename.rsplit(os.path.sep, 1)

    miniature = name + '_' + str(max_size) + mode + '.' + format
    thumbnail_path = os.path.join(basename, THUMBNAILS)
    if not os.path.exists(thumbnail_path):
    os.mkdir(thumbnail_path)

    miniature_filename = os.path.join(thumbnail_path, miniature)
    miniature_url = '/'.join((settings.MEDIA_URL, upload_path, THUMBNAILS, miniature))

    # if the image wasn't already resized, resize it
    if not os.path.exists(miniature_filename) \
    or os.path.getmtime(original_image_path) > os.path.getmtime(miniature_filename):
    image = Image.open(original_image_path)
    image_x, image_y = image.size

    if mode == SCALE_HEIGHT:
    image_y, image_x = scale(max_size, (image_y, image_x))
    else:
    image_x, image_y = scale(max_size, (image_x, image_y))


    image = image.resize((image_x, image_y), Image.ANTIALIAS)

    image.save(miniature_filename, image.format)

    return miniature_url

    Oct 28

    Django newforms file upload

    Existing documentation about updating request.POST with request.FILES is now obsolete.

    There are several places documenting that to get newforms working with file uploads, you have to do something like this (where NewformsClass is, you guessed it, a newforms class):

     new_data = request.POST.copy()  
     new_data.update(request.FILES) 
     form = NewformsClass(new_data)

    This is now out-of-date, and has been simplified to the following:

    form = NewformsClass(request.POST, request.FILES)

    Much neater.

    Oct 04

    Dumping Autosys JIL

    Note to self: how to dump the JIL for an Autosys job.

    This is another Autosys snippet for my own benefit. It's often useful to dump the JIL for a job that's already in the Autosys system, perhaps if someone's added it through the (dreadful) web interface.

    autorep -J <jobname> -q

    If <jobname> is a box job, then the JIL for the box and all jobs in the box will be dumped to stdout.

    Sep 16

    Django with mod_wsgi - live!

    My first Django app has gone live, using the new mod_wsgi.

    This is just a quick note to mention that my first Django application, the Photography Qwacks Diary, has gone live.

    It's using Django SVN, mod_wsgi 1.0 with a PostgreSQL backend. When I get back from Malta in two weeks' time, I'll write some posts describing the experience: from coding a Django app, through installing and configuring mod_wsgi and moving a live Gallery install, to getting the Apache configuration right.

    I guess I now have an 'interesting' Apache configuration, as it's serving up static files, a Zope site (this one), a couple of PHP applications and a Django application.

    Apache rocks.

    Sep 11

    Subversion: MKCOL 405 Method Not Allowed

    Having fixed extensive breakage after an update to Apache 2.2, I was left with one final, strange error message to solve.

    I hope this helps some other poor person before they tear all their hair out.

    This was driving me up the wall.

    $ svn co -N https://repo/svn/fez
    $ cd fez
    $ mkdir -p qwacks/trunk qwacks/tag
    $ cp -R ~/qwacks/* qwacks/trunk
    $ svn add qwacks
    A    qwacks
    A    qwacks/trunk
    AM   qwacks/trunk/manage.py
          ..etc...
    $ svn commit -m "initial import"
    Adding         qwacks
    svn: Commit failed (details follow):
    svn: MKCOL of '...': 405 Method Not Allowed (https://repo)
    svn: Your commit message was left in a temporary file:
    svn:    '/Users/dan/tmp/fez/svn-commit.tmp'
    $

    What?! What does that mean:

    svn: MKCOL of '...': 405 Method Not Allowed

    Much Googling suggested that either a firewall was blocking DAV requests, or my Apache server was configured to reject them. The latter I considered a possibility, since I'd just spent the last hour fixing my config. Poring over the files yielded nothing.

    Even worse, adding other files and folders to the repository worked.

    Which - had I not had a beer first, I would have realised much sooner - meant there was something wrong with the 'qwacks' directory specifically.

    The directory already existed in the Subversion repository - I hadn't realised that I'd already added it, and of course I did a non-recursive initial fetch, so I didn't spot it.

    Ultimately, the error was therefore mine - trying to add a directory that already existed in source control. However, a more descriptive error message would have been nice!


    Sep 07

    TF42053: Build Machine and AllowedTeamServer

    Problems encountered when reconfiguring a Team Build server to work from a different TFS instance.

    In the Day Job, we're moving from our dodgy, flakey, unsupported pilot TFS instance to our dodgy, flakey, supported production TFS instance. I foolishly volunteered to move our projects across and get all the builds working again.


    Stage 1: Moving the Source

    Of course, since Microsoft didn't deign to provide a tool to move repositories between servers while retaining history (no svndump for you!) this is pretty straightforward. A little care is required to make sure you don't accidentally pollute the repository with rogue files, but the basic process is as follows. I'll call the old TFS OLD, and the new one NEW.

    1. Make sure everyone's checked their outstanding code in.
    2. Create a new workspace in OLD, and map the root of the repository in this workspace to a new folder on disk.
    3. Do a Get Latest to fetch all the source code.
    4. Connect VS2005 to NEW.
    5. Create a new workspace in NEW, and map the root of the repository in this workspace to a new folder on the disk.
    6. Copy all the files from the OLD workspace to the NEW workspace
    7. Take this opportunity to delete all those junk .vspcc files that you've accidentally checked in
    8. Back in VS2005, add all the files using the Add dialog. Watch out that the default filter doesn't exclude things like DLLs, if you're checking those in.
    9. Commit.

    That's it - pretty straightforward.


    Stage 2: Reconfiguring the Build Server


    The next step is to reconfigure the build server to look at the production TFS instance. I'm not going to cover creating a Team Build type here, but if you do so and run it against your build server without changing the build server configuration, you'll see this message:


    Error message when build server is misconfigured

    Error message when build server is misconfigured







    Yep - a build server can only connect to a single TFS instance.

    Unlike most TFS error messages, this one is actually useful. If you follow it (to the letter) then it'll work. When you edit the TFSBuildServer.exe.config file, ensure that the server name is entered exactly as shown in the error message. It's quite picky. Then restart the build service.

    After that, the build should work - except that it probably won't.

    A variety of things can happen at this point. The problems I've heard about all seem to boil down to the workspace cache that the build server keeps. The symptom I saw was TFS showing the correct source download location during the CreateWorkspaceTask output in the build log, but then downloading the new source code to the wrong (ie. old) location.

    The solution to this is to dig into the cache directory kept by the build server. It'll be in the following directory:

    C:\Documents and Settings\<user>\Local Settings\Application Data\Microsoft\Team Foundation\1.0\Cache\

    Obviously, substitute <user> with whatever username your TFS build service runs as.

    Look for a file called VersionControl.config. Open it up, and you'll see references to your old build location.

    Simply delete this file, or rename it - and hopefully your build should now work.

    Incindentally, if you know of any tools to move repositories between servers, then do let me know.


    Sep 07

    Facebook Exposing User Profiles

    Facebook is opening up its user lists to public search engines such as Google. But it's OK, you can opt out. You did know that you can opt out, didn't you?

    I've been reading recently that Facebook is intending to open up its profile lists to the public - and in particular, public search engines.

    Of course, you already know about this because Facebook contacted you, explaining what they were doing, what the impact might be on your personal information, and how you could disable this if you wanted to.

    What do you mean - you haven't heard anything?

    Funnily enough, neither have I. I happen to know about it because I'm a geek who cares about privacy (nothing's on Facebook which isn't public domain anyway) but I doubt that my sister does. Or anyone else who's not a geek for that matter.

    Having been bullied by various friends to join the site, I've actually found it pretty compelling (read: addictive) to use. My main irk, however, was that all the privacy settings were turned off by default.

    There is of course a good reason for this - the site's success is based upon the fact that people can search for and find their friends. Maxing out your privacy settings would of course prevent this, and as we all know, people tend not to change the default settings (especially with the very fine-grained control that Facebook actually does give you).

    I therefore wish Facebook were a little more communicative about what they were doing with their users. I'd like to see a clear message sent explaining the changes they are making to the display of potentially sensitive information.
    I think it's unreasonable to expect users to regularly go to their privacy pages of their own accord to check their settings.

    Other system announcements - new features and so forth - appear at the top of a news feed in a different colour. There's no reason this couldn't happen for privacy changes too.

    To ensure that there is some public service element to what otherwise would be a ranting blog post, here's how to turn it off.

    Log into Facebook, and go to your privacy page. About halfway down you should see the following:

    Facebook's search engine privacy

    Facebook's search engine privacy





    Note how your public search listing is, by default, shown to external search engines. Uncheck that, press Save, and you're done.

    And finally - never give personal information to web sites that you do not trust.

    Sep 03

    IOError: [Errno 6] Device not configured

    A cryptic error message on Zope startup, but with a simple solution

    A friend of mine was reporting this error after he migrated his Plone site to a new server. It's one I'd not come across before, so I thought I'd record the problem and solution here for posterity.

    The error is because Zope was started in the foreground, but can't write (via the Python logging module) to the terminal. How could this be - surely there must be a terminal for Zope to be started in the foreground?

    Not if someone does:

    bin/zopectl fg &
    exit

    The solution: don't do that.

    Hope that's just saved someone some digging!


    Aug 19

    Facebook: Photography Qwacks

    I'm building a small site for a photography group I'm a member of.

    You may or may not have come across the software crack that is the Facebook web site. I'm pretty sure that you'll at least have heard of it. Well, I'm a member of it, and am also a member of a number of groups - one of which is Photography Qwacks. (I think you might need to be logged into Facebook for that link to work.) This is a group for Bristol (that's Bristol, UK) based photographers to get together, chat about photography and show off their pictures.

    Last week, one of the group's organisers sent out a message to all members, asking for someone to help out with a website. I offered to lend a hand.

    I've just finished prototyping the first part of this - a simple gallery application. Now, I've not written this from scratch - it's just the Gallery2 application that I've used a couple of times before. It's not even a Python or .NET app, it's written in the dreaded PHP, using a PostgreSQL backend.

    Anyway, the site is here, so do feel free to take a look. If you like my photos, all the ones I care to show to the world live on my own gallery site.

    The second part of this is a small events app. I thought about just using something like Event Wax but I want to be able to integrate into the gallery (or vice-versa) a bit better. Now from the little I've played with it, Django seems really great at handling date-based content. I'm therefore going to try building a Django site for tracking events, linking into Facebook and integrating the gallery.

    It'll be the first time out for me for Django, and I'm looking forward to seeing what it can do when used in anger.

    Aug 13

    Don't Learn Python

    Only joking. I just found a blog post which articulates a lot of the frustration I feel at the moment.

    Two posts in a day! Can you tell I'm having to write PL/SQL?

    You know when you're reading a blog, and something resonates really strongly? Well, I just stumbled across this blog post which precisely articulates the frustration I feel doing C# coding. You know there's a better way, but you can't go there.

    Perhaps C# 3.5 will address a lot of my frustrations. Time to re-learn Haskell too though, I think.

    Aug 13

    TODO: Varnish this site

    I need to find some time to try out the Varnish web cache.

    As you can undoubtedly tell from the HTTP response headers, this site is running on Plone. Plone's great for easy and speedy customisation (well, when you know your way around it!) but it's certainly not the quickest piece of software out of the box.

    I've spent quite a long time with Squid in front of Zope 2 and Plone before, but to be honest new development on it seems to have stalled somewhat. Features that I'd really love to see - such as wildcard invalidation and Varnish, an HTTP accelerator. So hopefully I should get to see how well it works (particularly on the measly virtual server this site lives on!) and you should get a faster site response.

    The architect's note is a particularly interesting read.

    I'll report back when it's up and running.


    Aug 01

    TFS and Mercurial

    So the TFS server has fallen over. Again. What's one to do?

    In the day job, we use TFS as our source control system. It's officially blessed, and will be supported Real Soon Now. Until then, we're stuck on a crummy pilot version that falls over pretty frequently. Most people down tools when this happens.

    However, there does seem to be a way forward. I use one of the new breed of distributed source control systems to continue working locally while the main source control server is down. I happen to have picked Mercurial (due to Windows support) but in theory this approach should work with any source control system which doesn't require a central repository. You get that sense of safety once more, that you can change code, roll back, and version. In fact, I can tag a version that I will eventually want to get reviewed and check into TFS as a changeset, and then continue working on it.

    Note that this solution simply lets you continue working locally, safely. If you don't have the files that you need to hand before your departmental source control server falls over, then you're stuck. Obviously you won't be able to commit back to it, either. And if (like TFS) your source control system fiddles with client-side state (file permissions, etc.) then you will probably have to fix that all up when your source control server does come back online.

    So how do you do it?

    First of all, download and install Mercurial, and initialise your workspace as a Mercurial repository:

    > cd path\to\workspace
    > hg init

    That creates a Mercurial repository in your current directory. Next, add and commit all the existing files to give you a starting point:

    > hg add *
    > hg commit -m "Initial commit"

    Now you can go ahead and make your changes. Remember that you'll have to hg add and hg del any added and removed files.

    When you get to the point where you would normally check in to TFS, just make a tag:

    > hg tag -m "Tests now pass" tests-pass

    You can now go ahead and continue making more changes, tagging each time you'd normally check in.


    The TFS server is back!

    Eventually, the TFS server (or whatever centralised source control system you use) will come back, and it's time to commit all the changes you've been working on.

    First of all, commit and tag your final changes into the Mercurial repository as you have been doing before.

    Now it's time to roll back the repository to each of your tags in turn, and commit into your TFS repository. First, get a list of all your commits to find out the tag names (there's probably a neater way of doing this, but I'm an hg novice!)

    >hg log
    changeset:   4:9a2bc8c4cd6e
    tag:         tip
    user:        dan
    date:        Wed Aug 01 11:25:30 2007 +0100
    summary:     Added tag BAR for changeset 4ed3893351e8

    changeset:   3:4ed3893351e8
    tag:         BAR
    user:        dan
    date:        Wed Aug 01 11:25:24 2007 +0100
    summary:     added foobar

    changeset:   2:46a175a2952b
    user:        dan
    date:        Wed Aug 01 11:24:51 2007 +0100
    summary:     tagging

    changeset:   1:129cee9d9958
    tag:         FOO
    user:        dan
    date:        Wed Aug 01 11:24:36 2007 +0100
    summary:     added world

    Here you can see I created two tags, FOO and BAR. First bring the working copy back to the state it was for tag FOO (changeset 1):

    > hg update -C FOO
    0 files updated, 0 files merged, 2 files removed, 0 files unresolved

    Commit that to TFS. Now bring your repository up to the next tag you made:

    > hg update -C BAR
    2 files updated, 0 files merged, 0 files removed, 0 files unresolved

    Rinse, and repeat. And watch your colleagues' faces as you seem to be generating large amounts of code in a very short space of time!

    Jul 07

    Python Genius Available

    I've written a lot of .NET code recently. I want to do some more Python.

    You've noticed that I've been concentrating on .NET recently - with Adaptation, and more recently, with Forms. I'm starting to miss that Python goodness.

    So this post is kind of an advert. If you have something that you want doing, in Python (including Zope 2/3 or Plone), with no particular timescales attached (hah!), with payment in general goodwill, then let me know.

    There just three conditions:

    • As I mentioned, I can't guarantee timescales. I'm doing this in my own time, for my own interest.
    • I'll retain the copyright, and I'll release the software under the new BSD license.
    • I have a day job. If what you ask me to do conflicts with the obligations I have for that, then clearly I won't be able to help.

    If you think that you can live with those three conditions then drop me a mail! Contact details are on the About page.

    Jul 07

    Forms - Automatic Form Generation for ASP.NET

    I've started working on Forms, a package for ASP.NET to automatically generate forms based on an interface or class.

    Continuing on my theme of Things I Miss From Zope, I've started work on a Forms library for ASP.NET. There are a couple of forms libraries for zope (primarily zope.formlib and z3c.form) which make it extremely easy to automatically generate forms.

    I'm starting off by working on form generation based on a .NET interface definition.

    Of course, this won't look too much like the aforementioned Zope libraries: ASP.NET has a substantially different architecture, and of course a huge library of controls already available. One of the design goals for the project is to allow these controls to be used seamlessly within the forms framework. Adaptation should provide the underlying framework for this.

    I'm at the early prototype stage at the moment, but of course as soon as I have something worth looking at, I'll release an alpha - that'll appear in the Software section, and for those using an RSS feed of this blog, I'll announce here too.

    Right, back to the practice for the British Grand Prix!

    Jul 01

    Adaptation 1.0.0 released!

    Adaptation 1.0.0 has been released. Adaptation provides Zope 3 style adapter registries and lookup for the .NET framework.

    I'm pleased to announce that version 1.0.0 of Adaptation has been released. This is now feature-complete. It's likely that any further improvements will go on a v2 branch.

    Adaptation provides Zope 3 style adapter registries and lookup for the .NET framework.

    Please download and use this release. I don't know of any bugs, but if you find one, put it in the tracker. Patches are welcome, especially with test cases. Don't worry if you can't do a test case, I'll try to add one as I look at your patch.

    Documentation will be coming Real Soon Now (I promise!), but it's getting late tonight to do it now.

    Download it from the Adaptation section in Software.

    Enjoy.

    Jun 29

    Safari on Windows

    Why I think Apple released Safari on Windows

    People have been wondering why Apple released Safari on Windows.

    Steve's presentation made it fairly clear that they want to build browser market share.

    I think it's also so that they can get Windows developers on board writing web apps for the iPhone. Safari is the nearest thing to an IDE for the iPhone.

    And no, I've not tried it yet. My Windows box runs Vista, and Safari runs just great on my venerable G4 Powerbook.

    Just my tuppence worth.

    Jun 29

    Get back, Web 2.0!

    Support for that browser stalwart, the Back button, seems to be declining in the current crop of Web 2.0 applications. Should we try to support the browser back button, or is a relic of Web 1.0? And who's that other dark figure in the corner?


    HTTP, URIs, HTML, Oh My!

    Tim Berners-Lee's definitions HTTP, URIs and HTML  ensured that the web uses a page-based metaphor. Until recently, this has stuck with us pretty much unchanged. Sure, we had frames, and they died a death. The page request/response cycle is baked hard into our web developer psyche, and the web browser's UI is predicated on pages and URIs.

    Roll forward to today, and techniques such as AJAX challenge this. Pages are dynamically generated and changed on the client, and what you see on the page is not just represented by the URI and server state.

    The result is that we've broken the back button. And we've broken bookmarks. And this is a bad thing.

    AJAX and friends have broken the page metaphor, and as a consequence, have broken some of the key controls that users have over their web experience.

    While technology changes daily, people don't. And people get confused then the back button on their browser doesn't work, or they suddenly can't add a favourite or a bookmark to the page that they're currently looking at.

    There are some ways around it, though it's pretty rare to find them implemented. GMail, for example, seems to work around it through clever page design: whenever you visit a page which logically can have a predecessor, the app does a full page load (thus maintaining the behaviour of the back button). AJAX-y functionality is reserved for 'drilldown' operations, such as opening up all the messages in a thread.

    Sadly, GMail is in the exception, despite browser back button support increasingly being built into open source toolkits.

    I should point out, of course, that this is hardly a new problem. A quick Google returns plenty of articles from 2005. But the problem doesn't seem to have gone away quite yet.


    The Cloaked Figure

    The other great enemy of the browser back button is ASP.NET.

    ASP.NET is the cloaked figure in the corner.

    In case you haven't used it, ASP.NET is obsessed with POSTs. Completely. It even supplies a LinkButton control that effortlessly converts dirt simple hyperlinks into form posts. It does this because the ASP.NET model attempts to layer a stateful processing model (in the style of traditional desktop applications) on top of the stateless HTTP by sticking great lumps of state in hidden form fields. This is why ASP.NET apps can feel 'sticky' to use; your browser is uploading and downloading large chunks of state information with each and every request. And it's also why a carelessly-written ASP.NET web site will break without javascript enabled - it's used to change all those link clicks into form submits.

    This tendency to use a post for everything, and the default behaviour of immediate cache expiry breaks the Back button in a new and exciting way. (Chris Shiflett has a nice explanation of why this is, and how to work around it. So good, I've pinched his picture. You can have it back if you want, Chris!)

    Does this look familiar?

    POST error

    POST error



    The text differs between browsers, but the meaning is the same - you pressed Back (or Forward, but nobody uses that) and the browser can't display the page without resending some data which has expired from the cache. Which might mean you order that La-Z-Boy twice. Trouble is, when everything's a POST, you're probably going to see this more often, because caching is hard to get right.


    So what to do?


    It's pretty simple.

    Jun 25

    Adaptation 0.0.2 released!

    The first alpha of Adaptation has been released.

    I'm pleased to announce that version 0.0.2 of Adaptation has been released. Adaptation aims to provide Zope 3-style adapters for the .NET framework. You can download it from the Software section.

    Currently only simple use cases are covered. In particular, you can't yet register an adapter for a class, nor does the lookup have any intelligence about adapters registered for subclasses or subinterfaces. There's no provision yet for named adapters.

    There's also no documentation yet; however, the unit test suite should give you a flavour of how to use the registry.

    The current level of functionality is basic, but still (I think!) useful. Comments and criticisms are welcome. If you find any bugs, please just email me or leave them in the comments for now. I need to get an issue tracker up and running, as well as a public Subversion repository.

    Jun 22

    VS2005, Unit Testing and Deployment Items

    Is it me or are DeploymentItems supremely dodgy?

    So there you are, happily coding up your unit tests to check that the new feature that you're about to implement is going to work (because, of course, you always practice DeploymentItem attributes to push test data files into the test output directory. Maybe your code looks something like this:

    [TestMethod()]
    [DeploymentItem(@"TestLoader\badquotes.csv")]
    [ExpectedException(typeof(ReaderException))]
    public void TestBadQuotes()
    {
      TestLoader loader = new TestLoader("badquotes.csv"'));
      loader.Load();
    }

    This test is already working great. So you add a new test, and a new data file, to demonstrate another case you've just thought of:

    [TestMethod()]
    [DeploymentItem(@"TestLoader\badquotes2.csv")]
    [ExpectedException(typeof(ReaderException))]
    public void TestBadQuotes()
    {
      TestLoader loader = new TestLoader("badquotes2.csv"'));
      loader.Load();
    }

    You run the tests... and both of them fail!

    And this time it's not because you've forgotten that DeploymentItem paths are relative to the containing solution (argh!).

    If anyone knows what's going on here, please do let me know. In the meantime I've added it to my list of things for with the only workaround is to restart Visual Studio.

    If only that list weren't so long. Maybe it's time to try out SharpDevelop...


    Jun 21

    Layering

    Developing software in service layers is generally a good idea, but only when the problem demands it.

    I've now spent some time working with substantial existing "Enterprise" codebases, written in .NET and talking to a relational database (usually Oracle, but increasingly SQL Server too). Something I see over and over again is the separation of the web UI layer, a business logic layer, and a data layer. Then the database will have a set of stored procedures, and finally the underlying tables.

    I wasn't surprised by this. Even while I was working on my previous Zope project, I understood that this was regarded as Best Practice by the Microsoft web/database development crowd.


    But then I start to see code like this in OnPreRender:


    GridView grid;
    grid.DataSource = BLL.Customers.GetCoolCustomers();
    grid.DataBind();

    Digging into the BLL.Customers, class we see something like:

    public DataView GetCoolCustomers()
    {
      CustomerDao dao = new CustomerDao();
      return dao.GetCoolCustomers();
    }

    So we follow that into the data access layer:

    public DataView GetCoolCustomers()
    {
      DataView dv;
    ... do some ADO.NET dancing...
      return dv;
    }


    Hmm. Seems a bit redundant to me. Never mind, I can understand keeping a consistent set of layers (kind of, more on this later). So we go and have a look at the stored procedure that GetCoolCustomers ends up invoking...

    ... and it's invariably a 500-line monster, doing everything from filtering out customers not considered cool to formatting their names so that they display nicely on the screen.

    What happened to the UI layer doing the display logic, and the business layer doing business logic? (At least the data layer actually does something in the above example).

    So how does one end up with a design like this? I think the following factors all have a part to play:

    • The seductive ease with which ASP.NET lets you bind web controls to underlying data sources. No code required! Except that you do need the code, as shown by the fact you've got a 500-line stored procedure driving this thing.
    • People slavishly follow Best Practice without thinking whether it's appropriate in their situation. A layered design works great for large systems, where there's significant 'thinking' to be done at each layer. If you're just splatting tables on the web (so actually that proc is a 3-liner, not a 500-liner), excessive layering is a waste of time.
    Done well, a layered approach can (maybe) bring you closer to the current holy grail of Service Oriented Architecture (SOA), where each layer in your system can work independently, providing services to other layers and components.

    Done badly, it takes you three times as long to find anything when you're developing, and then when you find it it's buried in a 500 line stored procedure.

    Rant over.

    Jun 19

    Adaptation - adapters for .NET

    I've started work on Adaptation, a small framework to implement adapter registries and object adaption in .NET. It's based (as far as C# allows) on the adapter functionality in Zope 3.

    I've always liked the way Zope 3 adapters work. They pave the way to being able to write small, reusable components. If you don't know much about how this works in Zope, Jeff Shell posted a great explanation of it.

    So how is this different from the decorator pattern that you're probably already familiar with? Well, a decorator usually completely wraps the context object, providing both its own interface (usually by implementing lots of proxies) and the target interface. Adapters are slightly different in that they only provide methods and properties specified by the interface being adapted to. This makes the code for an adapter substantially smaller than that of the equivalent decorator.

    Adapters come into their own when trying to reuse code. Traditionally, inheritance has been one of the principal mechanism of code reuse. If you wanted to use a class in your code, you would simply derive your own class from it with the functionality that you required, and use that. This causes your code to become tightly coupled with that from which you are deriving, and of course if a new base type comes along you have to write another class to derive from it.

    Adapters can potentially remove that tight coupling - particularly the way Zope does them. The developer makes declarations (either in Python or ZCML, the Zope Configuration Markup Language) about which adapters are available to adapt from what interface (or class) to another interface.

    Now, you may be thinking "Interface? This is Python! Python doesn't have interfaces!" And you'd be right. However, Zope provides the zope.interface module - to find out more, go and read Jeff's post on the details.

    So I'm planning to port the idea to .NET. I'm not sure that the clean Python syntax will go across:

    adapter = IMyInterface(context)

    I suspect the .NET version will have a static registry, exposing methods such as:

    Registry.Register(typeof(IAdaptsFrom), typeof(IAdaptsTo), typeof(Adapter));
    IAdaptsFrom adaptsFrom = new AdaptsFromImpl();
    IAdaptsTo adapter = (IAdaptsTo)Registry.Adapt(adaptsFrom);

    But I guess that's why we all prefer Python to C#!

    Anyway, enough for now - I'll post a little more when I've got the bones of it working. I'll be making a release on this web site when I've got that far.


    Jun 18

    Autosys Dependencies

    Figuring out how dependencies work in Autosys

    This is really just a note for myself, since it took me ages to find.

    In Autosys, you can have jobs which depend on other jobs. It's not obvious how you express dependencies between jobs (and the confusing Java applet web UI doesn't help).

    One stanza in the documentation happens to mention the dependency syntax while talking about something else. They look like this:

    success(my_job_name) and success(my_other_job)

    Nice.

    There's also a relatively useful Autosys Cheat Sheet from CA.

    (UPDATED to remove ^ mark - I understand this is used to resolve cross-instance dependencies. Thanks to our knowledgeable Autosys guy for spotting that!)