innoQ

Vladimir's Tech Blog


JRuby ActiveRecord performance

June 11, 2008

In our current project we are loading pretty big amount of xml data into the database. The xml parsing is very fast because we are using the streaming flavor of REXML like this:

source = File.new(fp)
REXML::Document.parse_stream(source, ImportListener.new)

class ImportListener

  def tag_start(name, attrs)
    @tags.unshift name
    @langs.unshift attrs['xml:lang']
    @origin = extract_id(attrs['rdf:about']) if attrs['rdf:about']
    relation_name = nil
    case name
    when 'rdf:Description' # Concept
      @pref_label = ''
      @definition = ''
  ...

  def current_language
    @langs.detect do |l|
      !l.nil? && !l.empty?
    end
  end

  def text(t)
    case current_tag
    when 'rdfs:label'
      @label += t.strip
  ...

So the most time is consumed by ActiveRecord with stuff like find_or_create_by_xxx. The whole import took 20 minutes / 14 minutes / 52 seconds (real / user /sys) with mysql running on the same machine. Hoped it would go faster with jruby time jruby -S rake xxxx:reimport. I'm using jruby1.1.2 build from source (rev 6586) with jdbcmysql adapter. With jruby it takes 24 minutes / 18 minutes / 0:44 seconds - about 20% slower.

Powered by Movable Type 3.31