link and self

The Atom Syndication Format defines an XML element named “link” that contains a “rel” and an “href” attribute (among others). Possible value for the rel attribute are either URIs (IRIs, to be exact) or simple text strings. In the latter case, they have to be one of the values registered at IANA; they should be considered equivalent to the URI created by appending the value to “http://www.iana.org/assignments/relation/” (which makes me wonder why IANA is unable to map this URI to the registry, but I digress).

I’m wondering about the “self” relation. According to the IANA registry, its definition is the one from Atom, which says

The value “self” signifies that the IRI in the value of the href attribute identifies a resource equivalent to the containing element.

The reason for my wondering is that I just re-read Subbu’s post on using URIs for identity. He argues there’s a problem because in using URIs for identity because you might retrieve the “same” object in different ways – e.g. you might retrieve a person’s information in two ways, just with the basic data:

GET /person/abc  
Host: www.example.org  

200 OK  
Content-Type: ...  

<person>  
  <link href="http://www.example.org/person/abc" rel="self"/>  
  <link href="http://www.example.org/person/abc?include=addressbook"  
    rel="http://www.example.org/rels/person-with-addressbook"/>  
  <first-name>Subbu</first-name>  
  <last-name>Allamaraju</last-name>  
  <email>[email protected]</email>  
  ...  
<person>

or including some additional information:

GET /myapp/person/abc?include=addressbook  
Host: www.example.org  

200 OK  
Content-Type: ...  

<person>  
  <link href="http://www.example.org/person/abc?include=addressBook" rel="self"/>  
  <first-name>Subbu</first-name>  
  <last-name>Allamaraju</last-name>  
  <addresses>  
    <address>  
      ...  
    </address>  
    ...  
  </addresses>  
<person>

The problem, as described by Subbu, is this:

Let me start with the “self” links. The person has a self link in each case, but they are all different. The client can not determine that the person with name Subbu Allamaraju found in the search results is the same as the one in the first or the second response. So, self links are useless to implement these scenarios.

But is this really a problem? The way I’ve used “self” links so far is to refer to a canonical resource, i.e. the one that represents the object or entity itself in the default way.

In other words, I’d turn Subbu’s second example into this:

GET /myapp/person/abc?include=addressbook  
Host: www.example.org  

200 OK  
Content-Type: ...  
<personWithAddressBook>
  <link href="http://www.example.org/person/abc?include=addressBook" rel="self"/>  
  <person>  
    <link href="http://www.example.org/person/abc" rel="self"/>  
    <first-name>Subbu</first-name>  
    <last-name>Allamaraju</last-name>  
  <person>
  <addresses>  
    <address>  
      ...  
    </address>  
    ...  
  </addresses>  
</personWithAddressBook>

Arguably, another option could be:

GET /myapp/person/abc?include=addressbook  
Host: www.example.org  

200 OK  
Content-Type: ...  

<person>  
  <link href="http://www.example.org/person/abc" rel="self"/>  
  <first-name>Subbu</first-name>  
  <last-name>Allamaraju</last-name>  
  <addresses>  
    <address>  
      ...  
    </address>  
    ...  
  </addresses>  
<person>

This has the downside that retrieving the resource identified by the outermost self link would yield something different than what you’re looking at. Whether this is acceptable or not depends on whether you consider these two resources to be “identical”. I’m undecided whether the official definition of the self link relation allows for this usage.

But in conclusion, I stand by my opinion that URIs can and should be used for identity – whatever “identity” might mean for you.

Comments

On January 16, 2009 2:41 PM, Aristotle Pagaltzis said:

Your use of self is precisely in the spec’s intent. The original motivation for self links was that if feeds had a proper media type, it would be possible to click the feed and have the browser save a temporary copy on which an aggregator is invoked, because the aggregator is registered as an application that handles this media type. But then the aggregator would have no external indication of the URI, which would there need to be inside the feed.

This is exactly how net radio streaming works – you download a playlist, which is handled by the MP3 player app on your system, and which contains the URL of the stream.

In other words, the self link advertises the URI of the feed which you want aggregators to subscribe to.

On January 16, 2009 4:58 PM, Subbu Allamaraju said:

I struggled with the definition in the Atom RFC, and concluded that the “self” relation is a URI for the link’s context whatever that be. See http://tools.ietf.org/id/draft-nottingham-http-link-header-03.txt. 4287 defines the self as to identify a resource equivalent to the containing element. Neither of these define a canonical resource.

Part of my interpretation is also based on how atom:id is specified. 4287 says “If multiple atom:entry elements with the same atom:id value appear in an Atom Feed Document, they represent the same entry.” It does not say that “if those entries have the same self link”. That is, it relies solely on atom:id for equivalence and not URIs.

I am not so sure about Aristotle’s last sentence. Feed subscribers rely on “alternate” relation for discovering feed links.

It would be interesting to see Mark and other contributors to 4287 think about this.

On January 16, 2009 6:32 PM, Aristotle Pagaltzis said:

Feed subscribers rely on “alternate” relation for discovering feed links.

In the scenario I described, which is what “self” was originally for, they can’t.

On January 17, 2009 11:35 AM, Stefan Tilkov said:

Mark’s old entry brings up a relevant issue:

An entry ID should never change, even if the permalink changes. “Permalink changes”? Yes, permalinks are not as permanent as you might think. Here’s an example that happened to me. My permalink URLs were automatically generated from the title of my entry, but then I updated an entry and changed the title. Guess what, the “permanent” link just changed! If you’re clever, you can use an HTTP redirect to redirect visitors from the old permalink to the new one (and I did). But you can’t redirect an ID.

I’m not sure I agree with this. Redirection is exactly what enables IDs to live much longer than any domain value-based key — i.e. you can have meaningful URIs (with possibly more than one identifying the same URI), unique without requiring centralized minting, without fear of leaving dangling references elsewhere.

On January 18, 2009 12:11 AM, alex.james.myopenid.com

said:

…So I was reading Stefan Tilkov’s latest post, about link and self. Good stuff. Now the bit that prompted this post was his discussion of the possibility of having different representations of the same thing…

On January 18, 2009 4:32 AM, Subbu Allamaraju said:

Here is a longer response that is longer than a comment :)

http://www.subbu.org/blog/2009/01/uris-vs-identifiers-take-two

In brief, IMO, the question is whether URI non-equivalence implies resource non-equivalence.

On January 21, 2009 1:01 AM, bdargan.myopenid.com

said:

IMHO URI non-equivalence does not imply resource non-equivalence. And if that is really important to your application there SHOULD be ways to handle it.

I agree with Stefan on providing a canonical resource.

You can argue both ways that person and person with address book are either two representations of a person resource, or two different resources, that is the great thing about the Web.

For this case in the atom self link what about using the rev tag to identify the canonical resource that makes sense for your application.

Since rev also accepts space separated list of link-types you could mark it with both the type and the uri of the canonical resource.


<link href="http://www.example.org/person/abc?include=addressBook" rel="self" rev="canonical http://www.example.org/person/abc"/>

As to whether or not two different entities that returned from different URI are based on the same version of the canonical resource or not:

I would use an EntityTag that encoded some value of resource state and some value of the representation. Eg. template for xhtml representation may change without resource state and the ETag must change in order to reflect that.

To KISS.

If you had an ETag consisted of something like “resourceVersion=20,reprVersion={date} Then your application could extract out the self links with identical rev tags and extract from the ETag the resourceVersion.

I have a longer response, not quite yet ready: http://brettdargan.com/blog/2009/01/21/link-and-self/