Stefan Tilkov's Random Stuff

XML Performance

An excellent paper (PDF? Why?) by Michael Kay about the performance of his Saxon XQuery processor [via James Clark]. Favorite excerpt:

  • Investigate every user-supplied performance problem in depth. There is no better raw material for under- standing how the code behaves, and without such understanding there can be no improvement.
  • Optimize the code that typical users write, whether it is well-written code or not. Try to educate users on how to write code that works well on your product, but recognize that you will only reach a small minority.
  • Never make performance improvements to the code without measuring the impact. If you cannot measure a positive impact, revert the change (easily said, but psychologically very difficult when you’ve put a lot of effort in). Keep records of what you learnt in the process.
  • Avoid performance improvements that rely on user-controlled switches. Most users (including people who publish comparative benchmarks) will never discover the switch exists; of the remainder, a good number will set the switch sub-optimally.
  • Remember that every optimization you make to your code is likely to require a substantial investment in new test material, and even then, is likely to result in several new bugs escaping into the field. Do not do it unless the gain is worth it.
  • Maintain a set of performance regression tests to ensure that performance gains made in one release are not lost in the next.
  • Separately, maintain tests to show that query optimizations are taking place as intended. In Saxon this is done by outputting an XML representation of the query execution plan for test queries, and checking assertions about these plans expressed as auxiliary queries.