Stefan Tilkov's Random Stuff

Google vs. Web Architecture

I’m really sad to see that Google continues its approach of tolerating — and thus, in my view, encouraging — when people build sites that break the Web’s architecture. First, we had this hashbang mess (tl;dr version: Ajax/JS-only sites suck, long version here), now the Google crawler will issue POST requests (no kidding).

Sure, there are worse things happening in the world, but from a REST perspective this is so utterly, totally wrong that it makes me really mad. A GET request is the only thing a crawler should ever issue if it intends to conform to the architecture of the Web, as these requests are safe. Issuing POSTs just because so many people don’t understand the distinction between GET and POST (or use crappy web frameworks that don’t) just means that even more people will do so. In the end, everyone will have to use heuristics to find out what can be called safely, and what can’t — effectively trading specified behavior for the typical kind of crap that you usually only get when something evolves without any architectural vision.

Google’s very core business was enabled by the Web’s architecture, now they’re slowly helping to ruin it.

“Do no evil” my ass.

Comments

On November 2, 2011 6:58 PM, Dong Liu said:

“The Web is more a social creation than a technical one.”

This robot POST abuse might force more people to read the specification and make their server conform to the standard and security requirements. Google robot offers a free test not only for GETs but also for POSTs now.

On November 2, 2011 7:27 PM, BrendelConsult Author Profile Page said:

I agree with Dong that a properly implemented API only allows POST requests by authorized users. However, Stefan’s point is valid: Even more people will now be inclined to use POST requests for the wrong reasons. “Not accessible to search engines” was at least one additional incentive for people to design their sites correctly. By removing this incentive, we will be stuck with badly designed sites/APIs for longer.

Or - even more likely - less people will be aware that there actually is a difference between GET and POST. There are enough people already who think that GET is just a ‘restricted version of POST’: An HTTP request that doesn’t allow you to send a message body. “Hey, POST is more flexible, since I can stuff in the URI as well as the message!” they think. “Let me just standardize on POST for everything and things will be easy!” they think. This sort of ignorance is now further promoted by Google’s official endorsement of this type of usage for POST.

On November 2, 2011 8:22 PM, Sergey Shishkin said:

Not to mention Google’s SPDY stateful protocol.

On November 3, 2011 1:20 AM, Kevin Reid said:

Note that this is not submitting POST forms, or following “links” built with POST — those will still not work with the Googlebot, or at least the article does not say it will.

This is doing POSTs that, if the page were opened in an actual browser, would be done unconditionally on page load without any user input. It seems a reasonable position to me that those pages have effectively published a machine-readable statement that those POSTs have “safe” semantics.

(Disclosure: I have been a Google employee (but not working on search).)

On November 3, 2011 4:42 AM, Viswanath Durbha Author Profile Page said:

I agree that we should encourage web developers to follow the RESTful architecture constraints for a better web. But these constraints are unfortunately self-imposed and cannot be forced upon. It reminds me of this paragraph from Roy Fielding in his blog.

“REST constraints do not constrain Web architecture — they constrain RESTful architectures (including those found within the Web architecture) that voluntarily wish to be so constrained. HTTP/1.1 was designed to enable and improve RESTful architectures, just as REST was designed to reflect and explain all of the best things about Web architecture. That does not mean that HTTP/1.1 is constrained to a single style; it means those other styles are not part of the design (i.e., we don’t care if future changes to HTTP will cause them to break). Only some of the architectures found on the Web are RESTful, but that doesn’t change the fact that RESTful architectures do work better on the Web than any other known styles. They work better because REST induces the architectural properties that the Web needs most — reusability, anarchic scalability, evolvability, and synergistic growth — and thus the Web architecture has been updated over time to promote RESTful styles over all others, by design.”

The blog I’m referring to is titled “On software architecture” ( http://roy.gbiv.com/untangled/2008/on-software-architecture )

I don’t think Google is doing the right thing here. But the real problem needs to be addressed at the web development side. I’m not too sure if the technical merits of following REST constraints alone would encourage developers to adopt them. If not, then I’m not really sure how it can be communicated effectively.