innoQ

Vladimir's Tech Blog


Google Groups vs. HTTP as Application protocol

December 22, 2007

While advising customers regarding the use of RESTful Web Services I notice over and over again how powerful the HTTP protocol is and how rich the communication patterns described therein are. For example, if you wish to protect some resource with a user name and password you can simply use basic or digest authentication (as described in RFC 2616) If somebody or something (a user via a web browser or a feed aggregator or a different application in a B2B scenario) tries to access the resource without providing the credentials, he/it will receive '401 Unauthorized'. That means, that the request requires user authentication and the response contains a WWW-Authenticate header field containing a challenge applicable to the requested resource, so the client knows which authentication schema it has to use.

Now to the actual problem. At our company we have started using Google Groups for internal (not customer related) technical discussions. It is possible to setup a private discussion group at Google, that is only accessible to users, that are explicitly listed. And Google Groups provides an Atom feed containing recent group messages. So far so good. Unfortunately I was able to access the feed neither from my favorite web aggregator, nor from my Mozilla Thunderbird Feed Agent. After some debugging I've found why.

There are lot of methods to trace HTTP communication: various Firefox plug-ins and all sorts of free ware programs. My favorite method is to use a tiny script. Ruby's HTTP API is structured according to the HTTP definition, it does not try to abstract the HTTP away. For example, you have to provide the host name and the request path separately (s. the example). The former is used to establish the communication with the server and the latter is used for accessing the particular resource.

require 'net/http'
Net::HTTP.start("groups.google.com") do |http|
     response = http.get('/group/innoq-dev/feed/atom_v1_0_msgs.xml')
     puts "Code = #{response.code}"
     puts response.body
end

After sending the request without the user name and password I would expect anything but '200 OK'. Guess what? I recieved '200 OK' but without the content of the feed (no xml content in the response). Instead a huge amount of java script crap inside, like

var is_opera = (agt.indexOf("opera") != -1);
var is_ie = (agt.indexOf("msie") != -1) && document.all && !is_opera;
var is_ie5 = (agt.indexOf("msie 5") != -1) && document.all;

and then output different representation of a log-in form accordingly.

Smells like a Google Web Toolkit. I think this approach (to have a huge switch statement listing all the known browsers) is inherently broken. This is why the most google applications do not work on the most devices I use for accessing the web. Some of them do not work on my Nokia N95 mobile phone, others not on my (older) T-Simpad or my (network connected) DVD-Player. All the other sites based on more "primitive" technology like HTML and HTTP (e.g. http://www.useit.com/) work perfectly: rendered perfectly and if needed can be syndicated and aggregated perfectly.

The power of the web is based on the possibility to link the content together and to use the content in the way, unexpected by the original content creator or application developer, not by listing all the user agents accredited by the higher authority (e.g. Microsoft or Google).

I'll probably write some adapter, that will inject the authentication cookie (extracted from the Firefox) and return the feed to a trusted client. At the current project it is other way around: we have to cover the "good" interface with a "bad" one.

IMGP2300.JPG

Powered by Movable Type 3.31