Decentralizing Media Types

, Feb 9, 2008

A while back, Sanjiva Weerawarana proposed (via email) a way to decentralize media types. I think the proposal was excellent; Dan Diephouse’s excellent latest blog post reminded me of it again. Here’s a brief introduction to a possible solution for “decentralizing media types”.

The Problem

In a plain HTTP interaction, the Content-type and Accept headers carry information about the type of the data being transmitted and accepted, respectively. You’ve seen these media types in numerous examples, e.g. a typical request or response might have a Content-type header with the value application/xml.

The problem with this approach is that media types have to be registered centrally with IANA. This means that while you can invent your own media types, nobody will know about them — unless you go through the time-consuming process of actually having your media type registered.

What’s wrong with application/xml? Nothing, really, except that it doesn’t tell you anything more than that what is being sent is XML: You don’t have any way to tell what XML it is unless you actually parse and e.g. look at the outer element’s XML namespace.

The Solution

What Sanjiva (and his collaborators, Paul Fremantle, Jonathan Marsh and James Clark) propose is this: Define a single new media type, application/data-format, with a required parameter uri. This uri points to a definition of the data format, like this:


The uri is an HTTP URI that points to an RDDL document, in other words: you can do an HTTP GET on it and retrieve a documentation of the data format that’s both human-readbable as well as machine-processable.

My Opinion

I think this is an excellent proposal, specifically because it does not rely on a centralized authority, and re-uses the namespacing concepts of the Web. It’s also fully agnostic towards any specific data format — you can use your own binary or text format, something like JSON or YAML, and if you pick XML, you’re free to use DTDs, RELAX NG schemas, Schematron or even XML Schema to document it. It’s also great in that it allows for clients with different knowledge about any particular format to do their best to handle it. One client might be hard-coded against the complete string; another might retrieve the RDDL, look for an XSD, and dynamically render some fancy visual representation.

I think the concept could even be extended to allow for querying of supported media types: You could just do a GET on the resource with an accept header of application/data-format and get back the link to the RDDL (if there is any).

Maybe there’s something immediately, obviously wrong with this idea — but if so, I can’t see it. It will be interesting to see what others say …

On February 11, 2008 8:47 AM, said:

I think this sounds like an interesting concept. It might work out great. However, there is something counter-intuitive here, where “application/xml” actually gives you more information about the format than “application/data-format” unless the URI given in the parameter is known to you already. Would it be possible to add another parameter called ‘base-type’ or similar to specify what “raw” MIME type the format is based on? E.g.:


Another nit to add, at least with both of these parameters; it doesn’t look very pretty. Perhaps the Content-Type doesn’t need to be overloaded? Perhaps an entirely new header could work instead? E.g.:

Content-Type: application/xml; charset=utf-8
On February 25, 2008 12:43 AM, Frank said:

Asbjornu certainly has a point. This way, generic XML clients might not recognize that the returned data is actually XML. But a new HTTP header is not necessary. How about agreeing on a content type parameter to use? Examples where “data-format” is the parameter name, and the value is a URI.

Some XML: application/xml;data-format= Some plain text: text/plain;data-format= Some generic data: application/octet-stream;data-format=

As far as I know, content type parameters are not regulated anywhere. An example of adding more information by content type parameters is Geography Markup Language. The content type of version 3.1.1 is “text/xml; subtype=gml/3.1.1”.

On March 5, 2008 11:05 PM, Bill Burke said:

I agree with Frank. It should be something like registered-mime-type; data-format=…

I think one of the powers of REST is that you can point your browser at a resource, and if registered mime types are used, you might be able to get a new rendering of the representation.

I’d be interested in any other conversations happening on this.