You are here: Home Blog Mobile API Design - Thinking Beyond REST

Mobile API Design - Thinking Beyond REST

by Dan Fairs last modified Jun 19, 2011 11:13 AM
This article explores the problems of optimising REST APIs for mobile device performance, and suggests a way of allowing clients to request alternate representations.

Nate Aune and Anna Callahan gave a great talk at this year's EuroDjangoCon about a service that they'd built in 24 hours, valentun.es. Along with a great story, the meat of the talk was about the concessions you have to make with a mobile API with respect to data transfer rates and connectivity. Some of the things they said struck a chord with my own experiences of designing mobile APIs, and inspired me to write this post about those experiences, principles, problems, solutions - and an idea for the future.

What's a REST API?

There are probably as many definitions of what a REST API is as there are implementations (so naturally I'll add my own), but they all share some common characteristics:

  • They recognise that the web is composed of resources, and are structured around that
  • They use the underlying HTTP methods (GET, POST, PUT, DELETE) to interact with resources
  • Representations of those resources are what actually flow back and forth between REST API servers and clients
  • URIs (usually URLs, which I'll use for the remainder of the article) are used to identify application state - particularly information on what the client can do next - andis contained within the representations received from the server.

Take a look at the Wikipedia article on REST for one description of what it means to be RESTful.

Resources

When you design a REST API, the first task is usually to consider what resources you want to expose to web clients. This is often pretty straightforward: a reasonable approach is often to use the same arrangement as your underlying data model (be that tables in a relational database, document types in a document management system, and so on). This doesn't have to be the case, of course - in particular, you'd be unlikely to expose a link table that facilitates a many-to-many relation in a relational database as an individual resource, for example.

Let's say we're building an API for a pizza shop. We're probably going to have a Pizza resource, a Topping resource, and an Order resource.

Resources tend to be arranged hierarchically, probably due to the nature of the URLs used to address them. It's therefore common to have resources in an API that work as collections of other resources. The URL for a topping resource might be `http://pizza.com/toppings/cheese`. The URL of the collection resource for all toppings would be `http://pizza.com/toppings/`. (The presence of absence of a trailing slash is meaningless, but for the purpose of this article I'll use the convention of collection resources having a trailing slash.)

Representations

Representations are how the client and server talk about a resource. Prior to the advent of machine-usable APIs, HTML was the most common form of representation. These days, XML and JSON representations are commonly used for machine-readable representations. PDF, JPEG, and MP3 are also all perfectly good representations of a resource as is, of course, good old HTML.

Keep in mind that a resource may have more than one representation. You might be able to fetch a resource as HTML (useful for humans) and JSON (useful for machines). It's also important to realise that representations are not simply content types: there's no reason a resource can't have multiple JSON representations, for example. This idea becomes important a little later on.

Once you've defined your resources, therefore, thinking of representations is usually the next step. The default choice of a simple JSON representation, with a key per resource attribute, is a common choice. That will do for now. A representation of our Topping resource might therefore look like this:

{
"calories": 100,
"name": "Cheese"
}

Resources may need to refer to other resources. This should be done in a self-describing way: the client should not need to have any knowledge of the server application to build its own URLs. Seeing keys like 'foo_id' in a JSON representation is usually a sign of this design error. For example, this is a reasonable representation:

{
"favourite_topping": "/toppings/cheese"
}

This isn't so good:

{
"favourite_topping_id": "36"
}

That 'favourite_topping_id' is meaningless to the client - it has to know how to construct topping resource URLs to be able to use that data.

Incidentally, note that the examples in this article only include URL paths, rather than full, absolute URLs. Either is fine; as long as the client can resolve them.

Interactions

The next piece of the usual REST API design process is to consider the operations available for each resource, and what they mean. There are four key operations provided by HTTP - GET, POST, PUT and DELETE. (Actually, HTTP does provide more, but this quartet is what is most normally used in RESTful API design.)

These are often compared to the core SQL-based relational database operations of SELECT, INSERT, UPDATE and DELETE. This is slightly misleading. SELECT and GET are fairly similar, as are the two DELETEs; POST and PUT are different beasts though. POST is used for a write operation on a resource that has side-effects. PUT writes to a resource, but has no side-effects.

Put another way, PUT is idempotent - if you do the same PUT twice (and there's no other state changes in between) then the system state will be the same. POST carries no such guarantee.

A good example might be to compare the creation of a new Topping resource with the creation of an Order for a pizza. Creating a Topping would probably consist of something like the following:

PUT http://pizza.com/toppings/jalapenos
{
"calories": 25,
"name": "Jalapeños"
}

This would create the Topping resource at http://pizza.com/toppings/jalapenos. Re-running the request would not make any difference. Changing the 'calories' field in the JSON to a new value would replace the existing resource with the new one. So - PUT has a 'create-or-update' semantic.

The response for this request would probably simply be an HTTP 200 response, with an empty body (or more strictly, a 204, which tells the client to maintain its view of the representation.)

Contrast this with what creating an order might look like:

POST http://pizza.com/orders/
{
"toppings": [
"/toppings/cheese",
"/toppings/jalapenos"
],
"card": "1234567890"
}

We've made this a POST request because it has side-effects: it bills your credit card, and sets off the process of making a delicious pizza. Making that same request twice would bill your card twice, and get you two pizzas. POST is not idempotent.

The HTTP response here would probably be 201 Created, with the representation of some Confirmation resource in the body, perhaps looking like this:

201 Created
{
"order": "/orders/432544"
}

Note once more how the response contains a self-describing URI, rather than some opaque order ID, which is not meaningful out of context.

Reality Bites

So we've identified our resources, the representations of them, and what operations the HTTP verbs actually correspond to. We're good to go, right?

Well, yes, basically. It'll work. But you'll almost certainly run into some problems. Before we get to the meat of selecting resource representation though, let's take a minute to consider a couple of real-world implementation problems you're likely to encounter.

Aside 1: Bad HTTP clients

There are some broken HTTP clients out there. I've run into one: Flash (circa 2009). Flash gets upset if it doesn't receive a 200 response from the server. In particular, if your server returns a 4xx HTTP code in an API response, Flash will not even pass the response to the Flash application.

On the Django project on which we ran into this, we ended up writing a custom middleware that looked for the presence of a magic query string parameter on the URL and, if it was found, replaced the status code with a 200 and put the real status code on the first line of the body. The Flash app then parsed out the response code from the response body. Ugly, but workable.

Flash also (at the time, it may have changed) was unable to perform PUT or DELETE requests. Our solution was similar: the Flash application would always perform a POST when it actually wanted to do a PUT or DELETE, and the real intended method went into another magic query string parameter. The aforementioned middleware would then rewrite the HTTP method on incoming API requests that carried this flag.

Aside 2: Rich Error Handling

The standard approach to expressing errors in a REST API is to use HTTP status codes. As is often the case with REST, this works fine for simple systems, but is simply too limiting for more sophisticated systems, particularly those which might submit a JSON or XML document to describe a POST request. If a client does submit such a rich request, which perhaps does not validate on the server, it is useful to be able to provide more than just a 400 Bad Request error.

Since I primarily use Django these days, and Django uses forms and formsets for validation, I have found that providing a standardised JSON representation of form and formset errors works well. The format can be specified in advance, and allows the server to inform the client of rich validation errors (down to field-level validation, with decent error messages) even over an API. I hope to write more on this in a future post.

Normal service resumes...

Normalised resource representations

OK, let's say we've got an iPhone app which uses our REST API to place orders and display the calorific content of the toppings we added. Let's look at the response that it might receive from a GET request to our Order resource:

GET /orders/432544

200 OK
{
"toppings": [
"/toppings/cheese",
"/toppings/jalapenos"
]
}

That's cool - the client can see that there are a couple of toppings there, so it fetches each one to get the calorific content:

GET /toppings/cheese
200 OK
{
"calories": 100,
"name": "Cheese"
}

And then:

GET /toppings/jalapenos

{
"calories": 25,
"name": "Jalapeños"
}

That's great. Our iPhone app can tell our user that the pizza they ordered has 125 calories.

What's not so great is that our iPhone app has had to make three separate requests. This works fine in development, across the office wifi network to the dev server. It doesn't work so well when a user's ordering a pizza on the train home from work, and the train goes into a tunnel halfway through this multi-request conversation (and the user was outside a 3G signal anyway).

Nesting resource representations

The natural response (and probably the right response) is simply to extend the representation of an Order resource to include the required representations of our toppings. This means that our Order resource representation now looks like this:

GET /orders/432544

200 OK
{
"toppings": [
{
"calories": 100,
"name": "Cheese"
},
{
"calories": 25,
"name": "Jalapeños"
}
]
}

That's cool. Our iPhone app now only needs to make one request, and it gets all the information on the toppings as well. We've traded the brevity of the original representation of the Order resource for not having to make multiple requests. The individual Topping representations are still available, of course.

To (ab)use database parlance, we've denormalised our Order representation, trading size for performance (that is, a smaller representation will be quicker to download).

Now, wind the clock forward a few months. We've extended our iPhone app and the supporting REST API to cover table reservations, meal pre-ordering, and so on. We've run into the problem described above, so we've heavily optimised our API responses to minimise HTTP round trips. Life is good. Right?

We're approached by a company who have developed an Android app that can use our API, wondering if we'd be happy to make it the official Android app. It's an awesome piece of software. They've thought about the user experience in a totally different way, and it works well on Android (though the current iPhone app approach works best on iPhone). The only real problem is that some of the API requests they make download a ton of data that they simply don't need; and they have to make lots of other smaller requests to make other parts of their app work as they want.

In other words, they want a different set of optimisation choices for the API.

What do we do? Add a new API version with different optimisations (even though it's still dealing with the same set of resources) in the representations? That doesn't sound so great, as we'd be maintaining two APIs. It doesn't scale, either - what happens when someone writes an app for the TV, which needs another set of tradeoffs?

A Possible Solution: Choosing Representations

(Note that this section outlines a possible solution - I don't have an implementation for this yet.)

The key insight is that the applications do not require different resources. They merely need different representations of those resources. Some frameworks allow this to be done by specifying an extension on the end of the URL. If we did that, we'd end up with:

http://pizza.com/toppings/cheese.xml
http://pizza.com/toppings/cheese.json

While not strictly wrong, this isn't ideal when thinking in resources. Different URLs mean different resources. Just because both XML and JSON representations are available doesn't double the number of resources - it's all the same cheese topping. We need some way of expressing what representation we want for a given resource. Ideally, we shouldn't change the URL.

Fortunately, HTTP gives us the tools to do this: the HTTP Accept header. It's usually used for specifying content types: text/html, application/json, and so on. However, the specification allows for extra parameters. Let's take a look (from http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html):

Accept         = "Accept" ":"
#( media-range [ accept-params ] )

media-range    = ( "*/*"
| ( type "/" "*" )
| ( type "/" subtype )
) *( ";" parameter )
accept-params  = ";" "q" "=" qvalue *( accept-extension )
accept-extension = ";" token [ "=" ( token | quoted-string ) ]

That means that a client could send a request like this:

GET /orders/432544
Accept: application/json;q=1;depth=1,application/json

Here, we've added a depth parameter to the acceptable types. This depth parameter works a little like the numeric depth parameter to a Django queryset's `select_related` method call - it asks the server to traverse one level deep into related resources. The standard also specifies that more specific content types take precedence over less specific types - which means that we have a mechanism for the client to allow fallback to a normal JSON representation of the resource. If the server cannot provide a representation that fulfils any of the values in the Accept header, it would return a 406 Not Acceptable.

Aside: q?

What's that 'q' parameter? It's what appears to be a historical quirk of the spec, where q is used to specify a 'quality' - used to allow the choice of a variety of representations based on degrading quality, eg. sample rate for sound. It's baked into the accept-params definition, so we include it here.

Example Requests

Let's compare two requests, one before using the proposed Accept convention, and one after:

GET /orders/432544
Accept: application/json

200 OK
{
"toppings": [
"/toppings/cheese",
"/toppings/jalapenos"
]
}

This is exactly the same request as we saw before, but with an explicit Accept header, specifying that the client is prepared to accept the normal form of the response. Let's see what might happen if a depth is specified:

GET /orders/432544
Accept: application/json,application/json;q=1;depth=1

200 OK
{
"toppings": [
{
"calories": 100,
"name": "Cheese"
},
{
"calories": 25,
"name": "Jalapeños"
}
]
}

We can now see the client expressing that it can accept resource representations nested to a single level - and the server responds in kind. Note that the client also specified that it could accept the normal form - so it would be valid for the server to respond with the first, normal form even to this second request.

What if the server could not fulfil the request? Perhaps the backend is unable to perform the join required to provide the nested data, and the client specified that it could only accept that nested representation:

GET /orders/432544
Accept: application/json;q=1;depth=1

406 Not Acceptable

It might be sensible for the server to provide the normal representation in the body anyway, in case the client was able to process it.

In particular, this response might be used to prevent (or mitigate) DoS attacks; depending on the application, the server might impose a depth limit of 2 or 3 levels.

Thoughts: Advantages, Limitations and Questions

The approach outlined above would afford clients some flexibility in 'denormalising' the data that they receive on request, avoiding both the need for developers to create custom code to nest related data and the tendency for APIs to become over-specialised to the needs of one particular API client over time.

It's also cacheable - the Accept header should form part of the cache key, and provides a graceful degradation for servers which cannot perform the data nesting request.

However,it's not a silver bullet: in particular, it's not clear how one might implement an analogue to Django's .select_related('foo', 'bar') form, where the API consumer could specify which resources it wished to be nested in the main resource requested. Instead, all the client can specify is a simple depth, and may therefore receive far more nested information than it might actually need.

As far as I can see, there are two problems to solve before being able to implement this more specific second form:

  • How to provide a self-describing way for a server to indicate that this facility is available for linked resources
  • How a client can tell the server which individual resource links to expand
    • Perhaps including an XPath-like parameter in the Accept header might work here (tweaked suitably to apply to JSON documents too); but how would the client know which paths were expandable without requesting the normal form to start with?

 

Implementation details aside, I do think that setting a convention (even of a limited form, such as the depth approach described above) would increase the usefulness of APIs. Perhaps I'll even get around to a Tastypie/Piston extension to implement it!

Comments on the approach, improvements and especially criticisms, are welcome in the comments.

Filed under: , , ,
Chris
Chris says:
Jun 16, 2011 11:11 AM
Check out this link http://www.odata.org/ - it may give you a couple ideas.

They resolved the "How a client can tell the server which individual resource links to expand" question by using query params so the following:

/orders?$expand=toppings

would tell the server which entity to include in the response
Dan Fairs
Dan Fairs says:
Jun 16, 2011 11:43 AM
Thanks for the link - I've not had the chance to dive into that.

My immediate response is that works fine for 'top-level' keys, but JSON and XML representations can be considerably more complex than that - hence me referring to an XPath-like syntax for expressing that.

(The other central idea of my suggestion is that stuff describing the format of the resource should go in the Accept header rather than querystring parameters - but that's bikeshedding, really.)

It's possible that's all already addressed on odata.org, of course - I need to take a read!
Vasiliy Faronov
Vasiliy Faronov says:
Jun 16, 2011 01:27 PM
Thanks for the article. I really like your idea of using Accept parameters to ask for arbitrary constraints on representations. I wasn't even aware HTTP made this possible.

One small comment, though: the HTTP spec actually makes a distinction between media type parameters (like "charset" for "text/html") and "accept-params" (like your "depth"), and it requires that the latter be preceded by the "q" (quality factor) value. So your example ought to read:

Accept: application/json;q=1;depth=1

Your friendly protocol police :)
Dan Fairs
Dan Fairs says:
Jun 16, 2011 02:44 PM
I did wonder about that myself, actually. I based that on an example in the RFC (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html) itself:
Accept: text/*, text/html, text/html;level=1, */*

I *think* that it'll parse as the final part of media-range, but whether that'll break existing server implementations, I'm not sure. I suspect you're right to be honest - I'll do some more research and update the article.

Thanks for pointing it out.
Vasiliy Faronov
Vasiliy Faronov says:
Jun 16, 2011 04:17 PM
"level" is an obsolete parameter of the text/html media type: see http://tools.ietf.org/html/rfc1866#section-4.1
Dan Fairs
Dan Fairs says:
Jun 16, 2011 05:46 PM
OK, looks like I was misled by the example! I'll clean that up over the next day or two. Thanks for pointing that out.
Vasiliy Faronov
Vasiliy Faronov says:
Jun 16, 2011 01:30 PM
Oh and one more thing. Apparently your blog software has problems with Unicode: it rejected my comment with U+2019, U+201C and U+201D ("The system could not process the given value."), but accepted it when I replaced those characters with plain-ASCII punctuation.
Dan Fairs
Dan Fairs says:
Jun 16, 2011 02:48 PM
Thanks. I see what you mean. I'd need to look at the comments component, could be some over-strict input validation (Plone - the backend for this blog - has excellent unicode support in general.)
Vasiliy Faronov
Vasiliy Faronov says:
Jun 17, 2011 09:54 AM
A couple more thoughts... hope you don't mind :)

The client would certainly want to know the result of negotiation--which features it ended up with. Content-Type is for the media type only, so we can't put arbitrary stuff in there. How then do we communicate this to the client?

1. In the entity body--requires special treatment for every media type, might not be possible for some, plus it creates an asymmetry between the request and the response (e.g. can't do HEAD to check).

2. In a non-standard response header like X-Entity-Params--carries all the disadvantages of a non-standard header.

There is this thing known as Transparent Content Negotiation[1]--I think it provides a framework for doing all sorts of feature negotiation, but personally I've never seen it used in practice.

[1] http://tools.ietf.org/html/rfc2295
Fijian
Fijian says:
Jun 18, 2011 01:40 PM
"The response for this request would probably simply be an HTTP 200 response, with an empty body."

Shouldn't this really be 204 if the body is empty?
Dan Fairs
Dan Fairs says:
Jun 19, 2011 11:09 AM
Yes, it probably should - I've never actually had to use an empty body before, and hadn't noticed that there was a code for it. Thanks!
Anonymous says:
Jun 19, 2011 12:02 PM
I've updated the article to reflect your comment. Thanks again.
Bill H.
Bill H. says:
Jun 22, 2011 05:58 PM
Neat Idea.

Perhaps I am missing something, but why not use custom media types?

application/vnd.mycorp.com.orders+xml
application/vnd.mycorp.mobile.orders+xml
application/vnd.mycorp.tv.orders+xml

You could even go as far as to version each media type.
 
Dan Fairs
Dan Fairs says:
Jul 02, 2011 11:29 AM
I did consider this, but decided against it for the following reasons:

- Discoverability - how does the client know what types are available before the request? (This is also the problem with my XPath-like mini-language)
- Too specialised - the serverside developer has to know all possible combinations of what a client may request. With something like the depth parameter on the Accept header, the serverside application developer can essentially ignore this whole problem, and framework code can handle it.
karl
karl says:
Jun 23, 2011 04:56 AM
There is a big missing part for it to be REST. Basically you are using HTTP and that is cool. The major property of a RESTful system is to be Hypertext driven.

Check this blog post by Roy Fielding
http://roy.gbiv.com/[…]/rest-apis-must-be-hypertext-driven

and read the comments too.
Jonathan Badeen
Jonathan Badeen says:
Jun 23, 2011 06:40 PM
I'd love to hear your thought's on resources that essentially require paging. For instance, how would you go about browsing the list of possible ingredients (100 at a time perhaps) if there are a million entries?
vincent
vincent says:
Aug 01, 2011 04:58 PM
I know this is not exactly the topic but in a more general way, how would you implement restrictions related to actions such as toppings creation or retrieve an order ?

If for instance you wanted to only allow the creator of the order to retrieve its information (at a later time). would you pass the authentication information through the request (POST for instance) knowing that this information has nothing to do with the actual resource other than restricting access to it ? or would you rather use another mechanism ?

Many thanks !
makuchaku
makuchaku says:
Dec 10, 2011 04:17 PM
Great article! I wish you would have written it when I was designing my API's :)
lvh
lvh says:
Dec 12, 2011 06:56 PM
Isn't this what SPDY is for?
Add comment

You can add a comment by filling out the form below. Plain text formatting.

Stereoplex is sponsored by Fez Consulting Ltd