So I wrote Gadget, which basically is an XML parser on BerkeleyDB steroids. It fragments the XML and saves the XPath projections of it. Then you can reconstruct the “skeleton” of the XML, which is the list of all the XPaths that were ever encountered by the parser.
Works on arbitrarily large XML files; might prove very useful. Also interesting:
Referee is a command line application that reads your web server logs and automatically finds out what other web pages have to say about your own. Unlike trackbacks, Referee is a completely automated tool: all you need is your server logs and a network connection, Referee will do the rest. And will work not only for blogs, but for any URL of your web site, not matter what program generated it.
The ultimate ego-surfing support — something I’ll try out for sure.