QCon SF 2009: Max Ross, Mapping Relational Data Patterns to the App Engine Datastore

, Nov 19, 2009

These are my unedited notes from Max Ross's talk about Mapping Relational Data Patterns to the App Engine Datastore at QCon SF 2009.

  • Datastore is transactional, natively partitioned, hierarchical, schema-less, based on BigTable – not a relational database
  • Goals: Simplify storage by simplifying development, management
  • Even though Datastore is based on the ridiculously scalable BigTable, you don't need to have scalability problems to benefit from it
  • Scale always matters - the problem is not in the second step, it's the first step
  • Free to get started (not only for the first 30 days), pay only for what you need
  • Let someone else manage upgrades, redundancy, connectivity
  • Let someone else handle problems
  • Detailed post-mortem of GAE downtime available somewhere
  • Scale automatically to any point on the scale curve
  • Trying to get people out of the business of managing their database in production
  • Basic entity: Kind, Entity group, key, age, + any number of properties
  • Datastore is schemaless - soft schema model. Much of the stuff available in the DB (constraints, type checking, schema) needs to move up to the app layer (but is usually replicated there anyway)
  • primary benefit of the schemaless datastore: much faster iterations
  • soft schemas can give you type safety despite using a simple key/value store underneath
  • JPA annotations provide soft schema - even though targeted at creating DB information, GAE can benefit from it
  • JPA annotations are a data definition language (proof: relational DB schema can be created from annotations)
  • Primary key in the datastore contains the kind and are hierarchical, e.g. /Person:13/Pet:Ernie
  • Analogy: Hierarchical datastore keys are similar to composite primary keys
  • Surrogate keys are harder to move - dropping is often not an option. Mapping options: 1) make surrogate part of the key a property 2) make surrogate key primary key, put rest into property


  • transactions in the Datastore apply to a single Entity Group
  • Entities in the same Entity Group share the same root part of the key
  • This makes Entity Group selection a critical design choice, with obvious effects on transactions
  • Too coarse hurts througput, too fine limits usefulness of transactions
  • Datastore does optimistic concurrency checks at the Entity Group level
  • [Strong relationship between data modeling and transaction processing – reminds me of the old debate on EJB 2.0 pre-final entity beans and dependent objects]
  • Unreleased new feature: Transactional tasks can update multiple entity groups, a task in a queue can participate in a DB transaction
  • Example: Deferred, transactional, async balance update (eventual consistency) as well as synchronous
  • Two-phase commit protocol algorithm implemented at Berkely, implemented by a Google developer (Erick Armbrust)


  • Letting a framework manage relationships can simplify code for RDBMS, but especially for App Engine Datastore
  • Goal: make handling relationships with JPA as easy as possible
  • Google's JPA implementatin has some sensible defaults: Ownership implies entites are placed in the same Entity Group
  • E.g. Person with a @OneToMay to Pet (with a back reference of @ManyToOne) makes both part of the same Entity Group


  • Testing set membership – requires a join table with an RDBMS, can use a multi-value property in the GAE datastore (select from User where hobbies = 'yoga')
  • Other than that, no joins supported
  • Conflict: Google promises that query performance scales linearly with the size of the result set; not possible when cross products are needed to fulfill queries
  • Making good progress with a subset for join progress, not releases yet - nowhere near ready for production
  • RDBMS encourage cheap writes and expensive reads; datastore encourage expensive writes and cheap reads. Denormalization enouraged where it makes sense
  • Obvious problems with denormalized data

Taking code somewhere else

  • App engine is in general more restrictive
  • Suggestion: Decide early whether or not portability matters to you
  • Shows examples of portable code - somewhat ugly
  • Congratulations, you have already sharded your data model

Key takeaways

  • App engine datastore simplifies persistence
  • JPA adds typical RDBMS features to the datastore
  • Important to understand how the datastore is different
  • Easier to move apps off than on
  • If portability is important, plan for it
  • http://gae-java-persistence.blogspot.com


  • Q. Does the shown transaction example really solve the problem? A. No, not to the full extent. lot of Google's billing software is built without multi-row transactions
  • Q. Is JPA a good model when starting from scratch? A. Many people like the low-level API, then start building an ORM on top of it … possibly better to start using an existing one.
  • Q. What kind of apps are on GAE? A. Not really known, many backend applications for iPhone apps, Facebook, … Obama virtal town hall meeting peaked at 700 req/s
  • Q. Export features? A. Some bulk import/export, but there should be more
  • Q. Caching? A. No direct support for JPA caching using memcached, but should be pluggable
  • Q. Is Python going to be replaced by Java? A. Absolutely not, the Java team rather has to fight to be accepted as an equal citizen
  • Q. Restrictions on some JDK features relevant? A. No.
  • Q. Staging area? A. No, not yet.
  • Q. JDO? A. GAE supports both, datanucleus supports both; JPA was chosen randomly for this talk today.
  • Q. Can apps be run offline? A. You can run the app SDK locally, but it won't scale; but stub implementations are pluggble and they could be replaced.