You are here: Home Blog Migrating to Django Mingus

Migrating to Django Mingus

by Dan Fairs last modified Jan 14, 2010 11:38 PM
I've migrated my blog from a creaking Plone 2.5 to a fork of Django Mingus. This is the process I went through.

I've been running a blog since 2007 (check the archives!). At the time, it made most sense for me to go for a Plone-based blog. Plone was what I was most familiar with, and there was a simple blog product out there that I could use called, sensibly enough, SimpleBlog. In fact, you can still grab it - it's where you'd expect on the Plone Products section. And as you can see, the release there was the one that I used at the time - SimpleBlog 2.0, for Plone 2.5

Fast-forward to the start of 2010, and things have moved on. Plone's moved on, for sure. Plone 4 is just around the corner, and there's some really, really cool stuff in there: Dexterity, a new content types framework, finally looks like it'll make content type creation as easy as it should be, and simple types and behaviours can be created through the web. Deco, slated (last time I heard) for Plone 5, is quite literally going to make publishers wet themselves. And Deliverance has got great potential in helping to unify the many disparate systems which makes the typical corporate user's daily life such a grind. Thing is, I'm not a big corporate user; a lot of what makes Plone great for that environment is overhead for me and my little blog.

The big change for me personally, though, is that more and more of my work is now Django. About two thirds of 2009 was Zope 2/Five, and a third Django; 2010 is looking like it'll be the other way around. So, having decided that my blog was looking a little tired, and seemed to be a bit of a spam magnet, I decided to bite the bullet and do a complete rebuild in Django.

I didn't want to write yet-another-blog from scratch, so I picked what seemed to have the most buzz around it at the time - Django Mingus - and started from there. And this is how I did it.

Oh, before I get stuck in, all the code for this site is open source. Fortunately, there's not much of it. There's no documentation as such (this is open source, after all) because the target audience is, well, me. I've stayed true to my Zope roots and used buildout; the buildout and Django project can be checked out from GitHub:

Stereoplex buildout and project

Onwards.

Planning

There's more to moving a blog than just installing your blog software and bashing out new articles. I had to consider a number of migration issues, specifically:

  • Obviously, I need to move all the content over from my old blog. That's not just articles: that's images, comments, the whole shebang.
  • I needed to either keep all the old article URLs from my previous blog working, or have them redirect to the new URLs.
  • Similarly, there are lots of RSS URLs out in the wild; I've handed a couple of them out to the Django community aggregator and the Planet Plone aggregator. There's also the main site feed.
  • I decided not to move user accounts over, as I'd already decided that I wasn't going to require a login to comment. I was going to go with a ReCaptcha integration.

So, the first step was obviously going to be to extract all the content from my old blog.

Getting the content out of the ZODB

Plone stores data in the ZODB. The ZODB is amazing. It was years ahead of its time, and provides a really natural way to store and interact with data in document-oriented systems. It's only really in the last year or so that we've seen the rise of broadly similar data stores, most of which talk HTTP and don't have the fine-grained transaction control that ZODB provides. The only thing to remember with the ZODB is that you can only access it directly with Python. That meant that I had to write a Python script to dump out all my content into a platform-neutral form. (I could have written a script which read the ZODB directly and created Django models, I guess. Using an XML intermediate format seemed easier though, and was probably quicker to work with - loading up the Plone 2.5 codebase is pretty slow.)

Most Plone applications, SimpleBlog included, stored its data using the Archetypes content type framework. Archetypes (often just AT) is a schema-driven approach to creating content. This is a pretty handy approach (even if AT's implementation was a bit awkward in places) as it's really simple to write code that can introspect that content. This was invaluable for writing an export script.

The result is on github: Simple-AT-XML-Dump. I should note that AT does have support for native XML dump and load. Thing is, I've got an ancient version of AT, and SimpleBlog used CMF (a layer underneath Plone) comments. I had no idea if XML marshalling would work properly, so I just did my own custom XML format. It's an instance script. It should work pretty well on any folderish Archetypes types. Invoke it like this:

bin/instance run Simple-AT-XML-Dump/run.py -p /zodb/path/to/plone/content -o out.xml

/zodb/path/to/plone/content is the physical path in the ZODB to dump. This needs to be the root folderish AT item. out.xml is the output file.

I won't go into the specifics of how the script works (if there's anything you're interested in especially, mail me or post a comment below). In a nutshell though, it'll dump out AT types by schema (including ImageFields with base64-encoded data) and any CMF discussions that are associated with them.

Starting with Django Mingus

Many years of pain have taught me to automate my build. The XML dumper instance script above is pretty much the smallest thing I'll do without a build (and even now, I feel a twinge of guilt at not having packaged it properly.) For me, therefore, the first thing to do was to get a build up and running.

The system du jour seems to be to use pip and virtualenv. Pip (a replacement for easy_install) lets you specify requirements files, which define which Python packages you application uses, what versions of them, and lets you install directly from the major source control systems. (This ability seems to have caused lots of requirements files with github links in them to spring up in 'released' packages, especially in the Django world. I regard this as a Bad Thing; but that's an opinion piece for another time). However, grizzled Zope veterans tend to reach for buildout in such a circumstance so, wanting to restrict New Fangled stuff to learning how Mingus was put together, I stuck with that. Buildout predates pip and virtualenv, and so does stuff that you'd normally just do with virtualenv, like creating an isolated environment. Buildout config files are also a touch more verbose than requirements files. That said, I'm still pretty sure that buildout is the more extensible system, with a vast array of custom recipes; and, in common with a lot of software from the Zope world, the narrative documentation is atrocious.

But I know buildout, and I'm not giving up these scars so easily, so that's what I went with, basing my buildout config on the requirements file that comes with Mingus. This process was actually pretty simple:

  1. Set up a standard Django buildout using djangorecipe
  2. Pop mr.developer in as an extension, add [sources] for all of the editable (-e) eggs in the pip requirements file, and add them all as auto-checkout
  3. Put all the non-editable eggs from the requirements file into the eggs section of the buildout config
  4. Use the versions supplied in the requirements file to create a [versions] section in the buildout config

This gives me a buildout.cfg file that mirrors the Mingus requirements file, but will also:

  • Create the django management script and django WSGI file automatically
  • Create a project and select an appropriate settings file for me

The end result of this is the GitHub project I linked to above, stereoplex-buildout.

Stereoplex

I created another egg, called stereoplex, to contain all my site-specific customisations and scripts. I used another package of mine, fez.djangoskel, to create the basic layout. Specifically, it has:

  • A Django management command to import the XML file created with Simple-AT-XML-Dump
  • A Django ModelAdmin subclass (actually a basic.blog.admin.PostAdmin subclass) to let me use TinyMCE as my editor
  • A ReCaptcha Django form field, widget, and custom comment form
  • An single extra view, which returns all items posted on the blog
  • A URLConf which brought together Mingus' URLs plus those for the extra view, and for TinyMCE
  • And of course, all the template overrides and CSS, JavaScript and images required for the new Stereoplex look and feel

Importing the Data

The next step was to write a data importer. This had to do a number of things (data migrations are never simple!):

  • Import all the images in the XML data file, creating basic.media.models.Photo instances for each of them
  • Rewrite all image links in the body text of posts to contain <inline> elements used by Mingus
  • Create basic.blog.models.Post instances for each blog post in the file
  • Create django.contrib.comments.models.Comment instances for every comment, and associate them with the appropriate post
  • Create django.contrib.redirect.models.Redirect objects for each imported post, to allow existing inbound links to be redirected to the new location.

Automated content import is one of those things that you tell clients is usually impossible. And when they're migrating from a legacy CMS platform (or indeed, hand-maintained HTML), then that's usually right. The inevitable gigabytes of hand-rolled HTML are at best poorly formed, and at worse represent content which needs throwing away anyway.

I was more fortunate. I didn't have that much content to migrate - 60-odd posts, and a few images - and Plone 2.5's default editor Kupu is actually pretty good at producing good HTML. I was able to directly use the existing HTML, and only needed to replace the <img> tags with the appropriate <inline> expected by Mingus.

Changes to Mingus packages

I did make some modifications to Mingus' packages. These were as follows:

django-mingus

django-basic-apps

django-sugar

All really very minor.

Server Configuration

Apache

The Plone instance used a standard small setup: a single ZEO client talking to a ZEO server, fronted by Apache with the magic RewriteRule. The Zope configuration had been tweaked slightly to be usable on a small (by Plone's standard!) 512MB RAM host, but I hadn't had time to do any other optimisations (for example, serving static files from Apache) that I would normally do.

Django forces you to do at least some of these. Static files are always served by an external web server (unless you are really determined to sail through all the warnings in the documentation). I also finally got around to turning on gzip compression.

One remaining task that I planned to do in the Apache configuration was to provide redirects for my RSS feeds. This wasn't as easy as I might have liked, since the old feed URLs had query string elements; and indeed, it was some of those values that I needed to formulate a correct redirect. I ended up with the following:

RewriteEngine On
RewriteCond %{query_string} ^(.*)?EntryCategory=(.*)?&(.*)
RewriteRule ^/search_rss /feeds/categories/%2/ [R=permanent,L,NC]

This simply declares three match groups in the RewriteCond regex, the second of which (hence %2) is the category slug. The RewriteRule then issues a permanent redirect to the new URL. We have to use RewriteCond, because RewriteRule regexes won't match a query string, only the URL path.

Next up is the mod_wsgi configuration. I use a fairly standard configuration, which is generally as follows:

WSGIScriptAlias / /var/websites/www.stereoplex.com/bin/django.wsgi
WSGIDaemonProcess stereoplex user=stereoplex group=stereoplex processes=3 threads=25 maximum-requests=1000 stack-size=524288
WSGIProcessGroup stereoplex

This runs the Stereoplex web application in its own process group, and with its own user and group membership. This is a safety net: if the site is compromised or (to be honest, more likely) I screw up and the app tries to write to the filesystem, its rights are limited by the host's file access control. The WSGI script file is generated by buildout. Probably the most interesting parameter here is the stack size. This is set lower than Linux's default value; for this sort of app, it doesn't need to be as large as the default. Setting this value to lower than the default led to a massive memory saving (remember, this is only a 512MB host; and this site is one of around half a dozen running on the same machine).

Memcached

Mingus makes extensive use of Django's caching support. Other Django sites on the same host use a Memcached instance, so I just pointed Stereoplex at that one. Memcached is essentially in a default configuration, except only bound to the localhost IP address, rather than the public IP address. It's configured to use a maximum of 64MB of RAM. This doesn't sound a great deal, but even with two or three websites using it, I haven't seen it go over 40MB.

So... All Done?

Nearly.

If I'm honest, the content editing experience isn't as nice as Django as it was in Plone. This is expected. Plone is a CMS, and has an administrative interface that's focussed on the business of managing content. Django isn't a CMS, it's a more general web framework. The experience is more like editing content in the ZMI.

That said, there are advantages. I've found third-party Django software much easier to integrate and customise than third-party Plone products ever were. I'm not fighting reams of configuration all the time. There's a strong mindset of developing apps to be reusable in the Django world; Mingus itself is little more than some UI glue (as, really, is Stereoplex). If I were tasked with delivering a large CMS with flexible authentication and authorisation, workflow, and so forth: I'd go for Plone in an instant. But for this job, many of Plone's strengths simply don't apply, and somthing more simple and lightweight was more appropriate.

Anyway - I'm quite pleased with the results. There are still a few kinks to be worked out (pygementize only seems to be being applied on the home page, not individual post pages, for example) but I'll get there over the next couple of weeks.

If there's anything you'd like to have more information on, then leave a comment (click on the article heading - yes, a proper link to comments is on the list!) or of course, just go and grab the code at Github.

Enjoy!

Filed under: ,
Kevin
Kevin says:
Sep 30, 2010 04:15 PM
Thanks so much for writing this. Glad you like Mingus. I'll make sure to merge your few patches. Keep up the good work!
Carl Meyer
Carl Meyer says:
Sep 30, 2010 04:15 PM
Great writeup! I particularly love seeing a writeup with "I made these fixes in the software, here are the commits/patches."

Totally agree about DVCS links in requirements files. I learned the hard way and have now removed all the ones I used to have. Almost wish pip didn't have that feature...

Btw, your comment preview seems to be broken.
Dan Fairs
Dan Fairs says:
Sep 30, 2010 04:15 PM
@Kevin - Thanks!

@Carl - the feature's pretty handy for that inevitable period where you need to use some software but a proper release hasn't been made. But it *will* come back and bite you if you're not careful. Thanks for the headsup about the comment preview, I'll add it to the list and get it fixed. Sure it worked last time I looked! :)

(I feel bad about not writing tests now. I usually do...)
Sven Deichmann
Sven Deichmann says:
Sep 30, 2010 04:15 PM
There is a simple reason for missing new releases of blog software for Plone: Plone 3 already contains most functionality needed for a blog. All non-generally useful functionality just needs a few lines in the eggs section and you have a full-blown blog. But in general, you are right: Plone is overkill for just a blog. But there are benchmarks that suggest Plone 4 will be way faster than Wordpress
Tunix
Tunix says:
Sep 30, 2010 04:15 PM
Thanks for this great post.

Can you elaborate on your dislike of DVCS links in requirement files. I use those links all the time and would love to learn what could possibly be wrong about using them.
Dan Fairs
Dan Fairs says:
Sep 30, 2010 04:15 PM
@Sven: Plone can indeed make a fine blog by itself; you're probably right about that being the reason for no more SimpleBlog releases!

There are a couple more reasons I steered clear of Plone this time around.

  - With Deco around the corner, and Deliverance increasing in maturity, I expect the theming story for Plone to change significantly soon. For the better, I might add; but I've been out of everyday Plone for long enough that I'd have to learn the Plone 3 way, then relearn how Deco fits together. I'm not leaving Plone behind forever :)
  - As I mentioned, much of my client work is now Django. It makes sense to get experience of as many Django apps as possible.

@Tunix - there's nothing wrong with such links, as long as you manage them properly. They're fine in development of course. I've seen many links (such as those in Mingus' requirements files, for example) which just point to whatever happens to be the head or tip of someone else's project in source control. All that has to happen is that someone pushes a commit to that repository that doesn't work properly, or introduces a backwards incompatibility, and *your* build breaks.

Fortunately, there is a way around this: fork the project, or clone it and maintain your own known-good copy. Even so, this isn't perfect, since you can't tell from the requirements files exactly what version of the software is needed. I believe that you can specify a particular commit in requirements files as a version to fetch from, though I've rarely seen that used. The best approach is to stick to released versions of software that are on PyPI and its mirrors. And if releases haven't been made, bug the authors to make them!
kevin
kevin says:
Sep 30, 2010 04:15 PM
Dan, Mingus only pointed to HEAD/TIP on some external projects because of a bug in PIP when generating "frozen" requirements files. When Mingus 0.9 is released, and moving forward, al released will be based on frozen versions of external apps so the issue you mentioned above won't be an issue.
tunix
tunix says:
Sep 30, 2010 04:15 PM
@Dan Thanks for your answer. I feared there is something flawed with pip itself. But it is not necessarily the fault of the tool if the user can't use it correctly. You can not only specify a specific commit, but furthermore you can also specify a branch or a tag. And if releases haven't been tagged, bug the author to tag them ; )


If there ain't any other faults with pip I think the (correct) usage of pip should even be more encouraged than it momentarily is.
Dan Fairs
Dan Fairs says:
Sep 30, 2010 04:15 PM
@Kevin - excellent, great to hear!

@tunix - glad to hear you can indeed specify a branch or tag! And indeed, blame the user, not the tool. I think that bears out my choice to use buildout rather than pip though: I know buildout better :) Pragmatism made the choice for me.

I still plan to evaluate buildout vs pip in the future (yes, I know there's a pip recipe for buildout).
Add comment

You can add a comment by filling out the form below. Plain text formatting.