premcache: caching and precaching with memcached

Courtenay : October 14th, 2006

Lets imagine that you have a list view of some data which takes a long time to generate, one second or so to suck the data from the database. This is unreasonable for a site of any proportion, and you certainly don't want to re-run query every time someone hits the page. The easy solution here is to cache the data, but today I'm going to show you a trick to _precache_ the data so that each page load is blazing fast without sacrificing user experience. I'm going to assume you know how to use memcached. Note that you can of course use all types of caching, but for the purposes of this exercise we're assuming they don't exist. h2. step one: some handy memcached code This is the land of ruby, and we have blocks to do our abstractification. This is similar to other memcached libraries out there

module CabooseMemcached

  def memcache_me(key, timeout=600, &blk)
    unless defined?CACHE # no memcache
      return block_given? ? yield : nil
    end

    # no block given, just delete the cache
    unless block_given?
      CACHE.delete(key) 
      return
    end

    # cache hit, block was not evaluated
    return results if results = CACHE.get(key)

    # otherwise, set the cached item
    if block_given?
      CACHE.set(key, results=yield, timeout)
      results
    end
  end
end
So, now we can do things like

@blah = memcache_me('foo') { Blah.find(23) }
memcache_me('foo') # delete the cache item
Back to the controller, where we're running some expensive queries.

class MonkeysController

  def list
    @monkey_pages, @monkeys = paginate(:monkeys, :per_page => 25, :select => '...')
  end

end    
Now we can memcache the result set so that the next time they hit that page, it's not going to hit the db at all.

class MonkeysController

  def list
    @monkey_pages, @monkeys = memcache_me("monkeys_page#{params[:page].to_i}") { 
      paginate(:monkeys, :per_page => 25, :select => '...') 
    }
  end

end    
Victory! However, each initial page load still takes that whole second to generate, and your app listener process will be blocked and unresponsive until it's completed. If you have 5 app listeners and 5 users visiting that page, your entire site will be 'down' until the query is finished. * note: if your data is user-context sensitive, you should make that key something like "monkeys_page#{params[:page].to_i}_#{current_user.login}" h2. precache the next page's results Since memcached is essentially a global shared object storage, we can preload the next page's data so that when the user hits that page, the data will already be in the cache. Your controller action now looks something like this:

class MonkeysController

  def list
    params[:page] ||= 1
    @monkey_pages, @monkeys = memcache_me("monkeys_page#{params[:page].to_i}") { 
      paginate(:monkeys, :per_page => 25, :select => '...') 
    }
    memcache_me("monkeys_page#{params[:page].to_i}") { 
      params[:page] = (params[:page] || 1).to_i + 1 # hack to get the next page number
      paginate(:monkeys, :per_page => 25, :select => '...') 
    }
  end

end    
but wait! this takes _twice_ as long for the first page and the same amount of time for successive pages..! what's the use in that? This is where we take advantage of memcached's "global" memory storage. h2. fire and forget Some of this code is taken from the excellent "daemonize":http://grub.ath.cx/daemonize/ library, and "_why":http://redhanded.hobix.com/inspect/iThoughtProcessDetachWasMyFriend.html

  def fire_and_forget(&block)
    pid = fork do
      begin
        yield
      ensure
        Process.exit!
      end
    end
    Process.detach pid
  end
This fires up another ruby (rails!) process, an evil clone of your current process, does your bidding, then dies. h2. the final code

class MonkeysController

  def list
    params[:page] ||= 1
    @monkey_pages, @monkeys = memcache_me("monkeys_page#{params[:page].to_i}") { 
      paginate(:monkeys, :per_page => 25, :select => '...') 
    }
    fire_and_forget do 
      memcache_me("monkeys_page#{params[:page].to_i}") do
        params[:page] = (params[:page] || 1).to_i + 1 # hack to get the next page number
        paginate(:monkeys, :per_page => 25, :select => '...') 
      end
    end
  end

end    
Now, every action (after the first page) will be immediately loaded. A background process will be fired off, loading the next-page's data into memcached. When the user hits that next page, the data is already sitting in memcached waiting for consumption. h2. finally A variation of this code is in production and gets hojillions of hits every day. Some of it may have gotten munged in the telling. I kinda made up the pagination stuff. Comments? Bugs? update(1) : you may need to reconnect to the database. the app in question actually uses an xmlrpc backend which is beyond the scope of this article, so we never had to deal with connections. completely untested. update(2) : if you want logging, you need to reopen the logger class, because file handles don't stay open in the fork update(3) : this'd be an awesome way to reduce the startup lag when, say, spawning mongrel clusters.. PDI!

6 Responses to “premcache: caching and precaching with memcached”

  1. Amr Malik Says:

    Great Writeup! I think the same can be done for the root pages as part of the application startup. So, I’d prolly do something like an ‘initialcache’ which would read a list of pages/queries to load when we’re starting the application. Might be better to decouple it from the mongrel cluster setup but that is six of one half dozen of the other.

    Sometimes you just want to restart your cluster, but not really mess with backend cached data, sometimes.. oh well.. I’m rambling now :)

  2. ifojenvnhryuiyfbe Says:

    Buy Viagra Viagra pill Viagra Online Viagra Soft Tabs Cheap Viagra BUY CIALIS Paxil Online Buy Viagra Viagra pill Viagra Online Viagra Soft Tabs Cheap Viagra BUY CIALIS Paxil Online

  3. paul Says:

    this code is amazing. exactly what i was looking for.

  4. Neil Wilson Says:

    It'll be the default way of doing Rails clusters once Ruby can do copy on write properly, because forking will save a ton of memory.

  5. renuka Says:

    hi, i m trying do caching with this code but its not working first time its working but in log file i cant see anything like cache set and after firt try it gives me error like undefined class/module Regobject so can anyone help me please? thanks in advance

  6. Andreas Fuchs Says:

    This is a very interesting approach. I tried it in a rails 2.1 application (using Rails.cache), and got pretty nice speedups initially. However, after the first few requests, this causes all kinds of weird errors, which ensured that this code didn't make it to our production servers.

    If I do reopen the DB connection (using ActiveRecord::Base.connection.reconnect!), I get "DB server unexpectedly closed the connection" messages after the fourth or so pre-cache run... definitely not what I'd expect (-:

    If I don't reopen the DB connection, it actually makes it to about 8 pre-cache runs, until I get ruby compilation errors in my application's files when running in development mode; when I turn on class and template caching, it gets even further, and then dies with unexpected memcache messages: I suppose the memcache client connection will have to be reset, as well.

    Right now, I don't know if the horrible entrail-groping inside Rails that this approach requires is worth the speed gains. )-:

Leave a Reply

I am a human (check this)

Remember: escape your underscores \_ and indent code at least 4 spaces or incur the wrath of smartypants.