This is a single archived entry from Stefan Tilkov’s blog. For more up-to-date content, check out my author page at INNOQ, which has more information about me and also contains a list of published talks, podcasts, and articles. Or you can check out the full archive.

QCon SF 2009: Stu Charlton, From Agile Development to Agile Operations

Stefan Tilkov,

These are my unedited notes from Stu Charlton's talk about From Agile Development to Agile Operations at QCon SF 2009

  • Cloud computing changing the game between development and operations
  • Suggested design goals for cloud computing
  • Integrated approach to application design, development and operations
  • Tennis match going on between the dev and ops side
  • Performance, scale and availability of design and operational decisions
  • You usually can't just tell the platform to scale your app
  • The offerings of commercial companies are mostly the result of buying companies that cover either operations or development
  • How can agile practices be applied to operations?
  • (Nice quote "Mimicking the illusion of working software by building a lot of documents")
  • Development values what is built; operations values what does not happen
  • Automated build, test, integration - what's the test environment in operations?
  • Not really test, rather planning and rehearsal
  • Autonomous teams – in operations, there's always a lot of legacy dependencies, need for situational awareness
  • Continuous integration - in operations, what's the source code?
  • Examples: Why can't two servers communicate? security, server configuration, network configuration, firewall …
  • Example: What do I need to scale out? Easy, simply start up more machines … no, not really: impacts on other systems, e.g. security systems, load balancers, monitoring, CMDB, service desk. Architectural issues: stateful or stateless nodes, repartitioning; limiting the scale out
  • Example: What is the authorative reality? What's the different between the current state and the one I want
  • In operations, transitional states matter a lot more than in development
  • What we have now: on demand provisioning of commodity infrastructure and constrained applications
  • What we still need to consider: configuration as data and as code; collaboration on design, development and operations
  • What funds a project is usually very different form what funds operations
  • IT complexity is overwhelming - not sure whether this is accidental or inherent complexity
  • Little tooling for collaboration in operations
  • Integrated view of operations and design: Different planes – management plane, cloud control plane, application plane
  • All of the vendors are working on building a platform for controlling cloud resources
  • Key question: what's the source code?
  • Bottom-up approach (based on scripts, recipes, runbooks)
  • Chef: DSL for describing infrastructure
  • Puppet used by Google to standardize all OS X desktops
  • Trying to use Maven in operations
  • Top-down (modeled viewpoints, enterprise architecture, configuration models)
  • UML profiles, MS uses Oslo to describe different viewpoint models
  • Configuration models: W3C SML - now it's been standardized, nobody's using it
  • Model-driven Collaborative Application Design
  • "All modeling is programming, all programming is debugging" (Neil Gunther)
  • Chef is very popular because it's easy; Puppet is declarative, which makes it hard to debug
  • Analogy: SQL query plan; tools could derive a plan from a declaratively specified model
  • Accounting barriers to Agile operations
  • Capex vs. Opex is only partially addressed in reality, as HW is only part of the cost
  • Promising approach: Time-driven activity-based costing; activity-based costing is an approach used to make consultants rich in the 80s, but in combination with time-driven seems useful
  • How to arrive at an integrated approach:
  • distributed, autonomous descriptions of the complete configuration
  • document-based description as the basis for collaboration
  • The way to enable collaboration of autonomous owners is to link configuration pieces via hyperlinks [he is a REST guy, after all]
  • Model-driven approach because something is needed that's both data and code
  • Problem with data: hard to debug
  • Problem with code: hard to see what's in it
  • Mentions Lisp as data is code/code is data example – it's been done before
  • Elastra approach: "Elastic Modeling Languages" (Open Source licensing): EMML, ECML, EDML - doesn't expect these to become standards, but part of the debate

  • Q. Applicable to private clouds? A. Very much so.

  • Q. There's a trend of expanding Continuous Integration to Continous Deployment. Does this apply? A. Modeling is not a conflict to an agile approach, small changes could be in production, no need to do things in a monolithic way. Both exist and need to co-exist.
  • Q. (rather a comment) one can start with a DSL, validate it, check dependencies etc. – bottom up is not a conflict A. A textual DSL is just a model.
  • Q. Would "structure" be a better term than model? A. That would only part of it. "Model" has many connotations people don't like, which is why people start using DSL
  • Q. Connection of OSS/Telco experience? A. One example is Erlang and Mnesia showing up as a technology in the Cloud space.
  • Q. Are there new technologies in the security space? A. Federated ID technologies getting some tractions, e.g. Azure using WS-Federation, SAML and OAuth are both growing. Directories still primary way.
  • Q. Is there a directory in the cloud? A. Concept of "virtual identity" instead, e.g. OpenID. SAML can be used with some Google apps, some Salesforce.com apps
  • Q. As an alternative to complex tooling, can co-locating/integrating developers and operations people help? A. Two approaches: Let's not do ops, let's just have developers do operations. Not good, usually a different value system. Second: Co-locate them and create autonomous teams. Good approach, larger Web shops do this - still a shared service team. Classic scaling problem: lots of interdependencies between teams. Tooling can help. Sometimes you even have to separate teams due to regulatory reasons.