A simple Rails slow-query logger

Courtenay : September 29th, 2008

A few years ago I wrote a simple addition to ActiveRecord that does two things: it chops out the eager loading "t1_t2 AS foo", and it shows the number of records returned for every query you run against the database. You can view the file here

Today I was profiling a site and wanted to quickly find the slow database queries, but didn't have access to mysql's config directly, so I patched that file above to record all queries over 500ms and save it to a log file. I'll warn you now, it ain't pretty, but it works pretty well.

Here's how it works: First, throw this in a file in config/initializers. I open up the rails abstract adapter

module ActiveRecord
  module ConnectionAdapters
    class AbstractAdapter

And add in a new logger.

        def slow_query; 0.5; end # number of seconds
        def slow_query_logger
          @slow_query_logger ||= Logger.new("log/slow_queries.log")
        end

Ideally of course this would all be configurable.

Next, I copy the logging code out of the latest ActiveRecord, and patch it to return the number of records. This is a bit of a hack, too, but we can either look at "num_rows" from the resultset or the actual size of an array.

              s = result && (result.respond_to?(:num_rows) ? result.num_rows : \
                 (result.respond_to?(:size) ? result.size : 0)) || 0

Finally, I rewrite the actual log method so that it checks the benchmark against our threshold

        def log_info(sql, name, runtime, result_size = 0)
          if runtime > slow_query && slow_query_logger
            slow_query_logger.debug "Slow query: (#{runtime}) [#{result_size}] #{sql}"
          end

And add the number of results to the regular rails log, while snipping out the annoying eager-loading code.

          if @logger && @logger.debug?
            if name =~ /Load Including Associations$/
              sql = sql.scan(/SELECT /).to_s + ' ...<snip>... ' + sql.scan(/(FROM .*)$/).to_s
            end

            name = "#{name.nil? ? "SQL" : name} (#{sprintf("%f", runtime)}) [#{result_size.to_i}]"
            @logger.debug format_log_entry(name, sql.squeeze(' '))
          end
        end

Here's the full file.

module ActiveRecord
  module ConnectionAdapters # :nodoc:
    class AbstractAdapter
      protected
        # todo: config this
        def slow_query; 0.5; end
        def slow_query_logger
          @slow_query_logger ||= Logger.new("log/slow_queries.log")
        end

        alias_method :old_log, :log

        def log(sql, name, &block)
          if block_given?
            #if @logger and @logger.level <= Logger::INFO
              result = nil
              seconds = Benchmark.realtime { result = yield }
              @runtime += seconds
              s = result && (result.respond_to?(:num_rows) ? result.num_rows : \
                 (result.respond_to?(:size) ? result.size : 0)) || 0 
              log_info(sql, name, seconds, s)
              return result
            #end
          else
            log_info(sql, name, 0, 0)
            nil
          end
          # old_log(sql, name) { yield }
        rescue Exception => e
          @last_verification = 0
          message = "#{e.class.name}: #{e.message}: #{sql}"
          log_info(message, name, 0)
          raise ActiveRecord::StatementInvalid, message
        end

        alias_method :old_log_info, :log_info
        def log_info(sql, name, runtime, result_size = 0)
          if runtime > slow_query && slow_query_logger
            slow_query_logger.debug "Slow query: (#{runtime}) [#{result_size}] #{sql}"
          end
          if @logger && @logger.debug?
            if name =~ /Load Including Associations$/
              sql = sql.scan(/SELECT /).to_s + ' ...<snip>... ' + sql.scan(/(FROM .*)$/).to_s
            end

            name = "#{name.nil? ? "SQL" : name} (#{sprintf("%f", runtime)}) [#{result_size.to_i}]"
            @logger.debug format_log_entry(name, sql.squeeze(' '))
          end
        end
      end
    end
  end

Would this work as a plugin? As a patch to Rails itself? Or did somebody else already implement a cross-platform slow query logger?

The awesomest filter and sort ever

Courtenay : August 26th, 2008

Update 2: seems like only one or two people knew about what can_search does :) I hope we’re all a little better educated.

Update: yes, I’m using these named scopes throughout the app in other places – they aren’t used only in this one controller.

Often you have an index action where you want to sort records, filter by a parameter, and maybe join on some other tables to get a result. Let’s say you’re looking at a videos controller (where videos are acts_as_taggable) and you want to filter by user_id, filter by tag name, order by video title, or rating. Maybe later, you’ll add a roles (hm:t) association and need to only show videos viewable by a certain user. How complex!

To solve this, we’re going to play with some things you may know, and finish up with a bam! pow! that’ll take your breath away.

Rather than build up some form of frankenquery with all sorts of conditionals and cases, joins, and other messing about, let’s use a brand-new bleeding edge feature of Rails: named scopes.

First, build up individual named scopes for each axis on which you wish to filter. Make sure and put the table name in that query.



    named_scope :by_user, lambda { |user_id| 
      { :conditions => ['videos.user_id = ?', user_id] }
    }

    named_scope :tag_name, lambda { |tag_name|
      { :joins => { :taggable => :tag },
      { :conditions => ['tags.name = ?', tag] }
    }

    named_scope :rating, lambda { |rating| 
      { :conditions => ['ratings_count > ?', rating] }
    }

OK, I cheated on the last one, but let’s assume you have a counter_cache on ratings count.

Now, if you have more than one scope with joins in it, you’ll need to apply this patch to your rails installation, or upgrade past 2.1.1. This will allow you to have as many joins as you like in your scopes.

Now, here’s where the magic happens: in the controller. Big shout out to protocool for this method. Let’s build up a set of all the possible scopes that we might want to use, in an array form like [ named_scope, argument ]

def index
  scopes = []
  scopes << [ :by_user, params[:user_id] ] if params[:user_id]
  scopes << [ :tag_name, params[:tag_name] ] if params[:tag_name]
  scopes << [ :rating, params[:rating] ] if params[:rating]
end

Easy, right? Very readable.

How about some ordering?

  order = { 'name' : 'videos.name ASC' }[params[:order]] || 'videos.id DESC'

Now, as you know, you can chain named scopes. So you could say Video.by_user(2).tag_name('monkeys') Let's take advantage of this, building up a chain of scopes dynamically using 'inject', starting from Video, and adding each scope we added to the array above. This is really fun magic, because it doesn't run any of the queries until the whole thing is built. I don't even know how this works, but it does. Swimmingly.

  @videos = scopes.inject(Video) {|m,v| m.scopes[v[0]].call(m, v[1]) }.paginate(:all, :order => order)

The final method looks like this:

def index
  scopes = []
  scopes << [ :by_user, params[:user_id] ] if params[:user_id]
  scopes << [ :tag_name, params[:tag_name] ] if params[:tag_name]
  scopes << [ :rating, params[:rating] ] if params[:rating]

  order = { 'name' : 'videos.name ASC' }[params[:order]] || 'videos.id DESC'

  @videos = scopes.inject(Video) {|m,v| m.scopes[v[0]].call(m, v[1]) }.paginate(:all, :order => order, :page => params[:page])
end

One final caveat. Sometimes :joins doesn’t know where to get the video id from, so if you’re using id in your app, you’ll need a slight workaround involving manually getting the pagination count, and forcing :select => &#8216;distinct videos.*&#8217; in the paginate call.

If this works for you, it’s really easy to add new filtering, ordering, or even scoping to your query. For example, you can add some form of role hackery to your video


    named_scope :viewable_by, lambda { |user| 
      { :joins => { :permissions => :roles },
        :conditions => [ "roles.user_id = ? AND permissions.role = ?", user.id, "view"
    }

Controller, you replace the first scope definition with this

scopes = [ :viewable_by, current_user ]

Or, you modify the scope inject statement


    @videos = scopes.inject(Video.viewable_by(current_user)) { |m,v| ... }

If you consider this a giant hack, you’re probably at least partly right. However, the alternative in building up a complex query with many possible moving parts is just hideous. And consider this: you can unit test each part of the query on its own, in the model specs.

Taking a vanilla rails application from one box and up is a fun process. The exact path you’ll take depends on the nature of your data, and the ratio of database reads to writes. I’m going to cover some of the more common use cases. If you don’t want to get your hands dirty and it’s kind of an emergency, look at stage zero, then skip to the end where I tell you who you can just pay to fix it.

The path you’ll take also depends on how much money you have to play with, and how quickly your site is growing. For example, if you’re sitting on a mountain of cash, and the facebook users are coming in like lemmings, then you can just throw hardware at it. However, if things are tight, and it’s a nice linear growth curve, then you can play around with caching.

Let’s assume you have a slice or VPS (if you’re on shared hosting, the first step should be to get a dedicated box or at least a xen instance).

Stage zero: fix any ‘duh’ errors

Make sure you’re on a database that can handle the load. This doesn’t include sqlite. I’m going to suggest MySQL in this article, because it’s where I have the most experience.

Make sure you’re not serving up static files through mongrel. This will happen if you are proxying everything through the webserver.

Upgrade your webserver to something like nginx. Alternatively, you might use pound as a load balancer, pointing dynamic requests at mongrels, and static requests at lighttpd. (Interested? I can write an article on this. Let me know.)

Move off that $20/month shared box and get your own server. You can lease a phat server in a data center on 100mbit pipe for $100/month. If you want to colo, I recommend Corporate Colo in Los Angeles.

Move any slow actions into a dedicated process. For example, you have some code that takes 4 seconds to update a bunch of tables? You probably want to fire an asynchronous event to a BackgrounDRB process that handles this exclusively.

Move your uploads to a dedicated merb cluster – it’s like a cut-down rails with less magic and more speed.

Stage one: clean up your database

Take a look at your logs – are you performing over 10 database calls per request? You need to fix this. Are you performing over 90? You’re a dumbass. (yes, even I am guilty of this).

Generally you can reduce the number of requests by denormalizing; for example, you have a list of users and a count of how many comments they’ve made.

<% @users.each do |user| %>
  <%=h user.name %> (<%= user.comments.count %>)
<% end %>

You’re performing a “COUNT” for every single user, every time the page loads. Yuk! This is a “read-optimizable” situation, since there are many READS for each WRITE (comments don’t get created that often).

Add a counter_cache to the comments belongs_to :user association and change this to

<%=h user.name %> (<%= user.comments_count %>)

You can do this in other situations where you’re chaining through associations.

Your task is <%=h @list_item.task.name %>.

This call has to find the list_item task, and then grab the name. You can fix this by either adding :include to the ListItem.find, or, you can denormalize the task name.

Include isn’t always an option, and can be slow. Nothing’s faster than denormalizing. Add a “taskname” to the listitem model.

Your task is <%=h @list_item.task_name %>.

Then make sure to update that field if the task gets updated.

class Task < ActiveRecord::Base
  has_many :list_items
  after_save :update_list_item_names

  def update_list_item_names
    list_items.each { |li| li.update_attribute(:name, self.name) }
  end
end

Yes, there are faster ways of doing this, and yes, I should probably wrap that in a transaction. But you get the point. (note to self: if rails had dirty-field checking, this would be much better)

Stage two: cache the hell out of it

Next thing you want to do: caching. If you haven't already, install memcached, use the cache-fu plugin, and start saving the results of long-running or frequent queries into the cache. Set the TTL (timeout) at about 5 minutes; that way you won't need to write any expiry code (it's lazy, but you're busy!) You'll immediately notice a drop in load. If you have time, write some cache-expiring observers and up the TTL to 15 minutes or even an hour.

Eventually you want to have memcached sitting between your application and the database. Most of your database calls' results will be stored for at least 5 minutes, and maybe forever, in memcached.

If you can, add some action caching. Action cache is like page caching, but it runs any filters you may have. Action caching isn't always easy, particularly if you have "current_user" dependent code in your views. I have a solution for this which I'll be releasing soon, but in the mean time, you may not be able to action cache. Any action-cached pages will be vastly beneficial to your load, and combining memcached with action caches means that you can virtually eliminate any database slowness and is almost as good as the page cache.

If you can action cache, then you can probably page cache. A page-cached site will get you about 3,000 requests per second, thereabouts, and a simple GET request won't even hit your application; you're serving raw html through the webserver. You will soon start thinking of rails as an HTML generator, rather than an app server.

However, all these caching measures won't hide a basic problem: you are performing lots of database queries, and it's harshing your mellow.

Stage three: move the database to another server

This should be fairly painless. Get yourself a fat database server. By fat I mean, super-fast disks, plenty of RAM, and the fastest networking you can afford.

Set it up so that it's only accessible from your main box, which will now be known as the app server. Point your database.yml at the IP of the database server.

Now your app server has much less load, so you can increase the number of mongrels. Add some more RAM to the app server box, too, if you can.

How many mongrels?

Here's a simple formula to follow.

A. Take the (average or median) request time, in seconds.  Say, 0.250 seconds (250ms)
B.  How many requests do you want to handle at peak?  (e.g. 10,000 a minute, 166 a second)
C.  Multiply A x B :  0.250 * 166 = 41.5

So you need about 40 mongrels to handle the load. At about 60MB per listener, that's 2.4GB of RAM, plus a bit of room for leakiness and swapping. Ezra at Engine Yard suggests "about 10 dogs per CPU core", which means that if we have a 4-core opteron box with 3GB of RAM, then this is possible on one box.

Your mileage will vary, which means, if the box is lagging, remove a few mongrels.

Stage four: add more servers as necessary

Here's where it gets interesting. Which of your servers has the most load?

If it's the app server, then setup another box as an exact copy. Now you have app1 and app2. You will need to load balance between app1 and app2. You can do this with a hardware load balancer, or you can use pound on app1 to balance to listeners on app2. (You'll have a single point of failure on app1 if you do it this way)

If the db server is the most heavily loaded box, things start getting interesting: you'll either need some kind of replication, or you'll need to shard (partition) your data.

Replication vs Sharding

Take a look at the data in your application. If it were a person, would it be "extroverted" or "introverted"? That is, could you split the data into many sections (no friends, introverted), or is it all cross-linked (lots of friends)?

For example, you are hosting subversion repositories. You can easily send half the records to one database and half to the other. Or, you host thousands of social networks, each with about 50 users (collectivex, I’m looking at you.)

In this case, one database box would handle all users with names A–L, and another box from M–Z.

If you have a social networking site where anyone can be friends with anyone else, you’re going to have difficulty partitioning the data. (Astute readers will instantly think about denormalizing to make this still possible).

If you have one shared table (users) but the rest of the data can be sharded, then you will want some bastard stepchild method.

Replication: Master-Slave

So, replication (MySQL only from here on). Let’s say you have a few writes (inserts, updates) and a lot of SELECTs. Most people are just viewing things, not updating records. This is fun and easy.

You set up one database box as the “master”. This box will behave as normal. You can read and write data as before.

You then set up as many “slave” boxes as you like. These boxes will be read-only, but because you have a large amount of reads, then much of the load can be pushed out to these slaves. You’ll need to hack at your rails app to direct simple reads at a slave DB. Luckily, someone’s already done the work and called it acts_as_readonlyable.

The problem here is that the slaves will always be lagging, depending on load. Under light load, they may only be 100ms behind. Under heavy load, you can’t be guaranteed of any sort of synchronization. In this case, you’ll want to use memcached heavily. Here’s some cache-fu code.

class Category < ActiveRecord::Base
  acts_as_cached

  def after_save
    Category.set_cache(id, self)
  end
end

When you save the category, it pushes the record (self) into memcached. That means, with a long TTL, you'll never need to do a simple “find” on category from the database, and replication lag won’t matter.

Finally, you’ll want to load-balance to the database servers, an exercise left to the reader.

      / write==[db1] master
[app1]
     \                /==[db2] slave
      \ read==LB==[db3] slave
                       \==[db4] slave

Replication: Simple Master-Master

In this situation, you have two sets of stacks. Each stack has an app box and a database box. They are almost identical; the app server is wired to one database server. There is no crossover. ASCII-tabulous diagram:

   /[app1]====[db1]  master+slave
LB                        |    replicate
   [app2]====[db2]  master+slave

Both databases in this case are masters. That means, both act as masters, but both act as slaves as well. You can even set up the boxes so that if one goes down, the other fails over and takes on both IP addresses.

Because the setup is asynchronous, you need to assign each database a separate set of autoincrement keys. DB1 will increment values like 11, 21, 31, 41, 51, and DB2 ids will increment like 12, 22, 32, 42, 52. You set these with auto_increment_increment and auto_increment_offset.

Take care! If you have a UNIQUE index on other, non-auto-increment fields, you need to make sure that the same database will be used for CREATEs. You’ll need some algorithm, such as checking the final character or number of the unique field. You’ll also need some way of redirecting writes to a specific database, as well as dealing with load balancers. You may find MySQL Proxy useful here – you can use Lua to control the load-balancing at the db layer.

Master:Master replication doesn’t really scale past 10 boxes, because the databases will be so busy updating that they won’t be able to serve requests. However, 99% of rails applications won’t get to this stage.

And remember – there will be some replication lag between the boxes, so your code will need to be tolerant of this issue.

Stage Six: More boxes!

At this stage, you should have most of your data stored in memcached, and it’s time to get yourself a dedicated memcached box with gigabit networking and a metric crapton of RAM.

Your data should be nicely segmented (sharded, or partitioned) into separate databases.

Stage Seven: You’re Going To Need Help

If you’ve roughly followed all of the above steps, and your site is still lagging, you either didn’t follow the instructions, or you’re beyond this and need to bring in some experts. Replicating and sharding should cover most people’s scaling needs, such that you just keep adding stacks of app+db and expanding the memcached cluster.

You can hire skilled rails consultants to clean up your code (there are plenty of #caboosers with the requisite experience), or you can use a hosting service like Engine Yard (staffed almost exclusively with caboosers) where they will have your app running on a cluster pretty much like I’ve described above, only bigger and faster. It’s going to cost you, but you get what you pay for. Hell, they even deploy your app for you.

Developers, if you’re not an avid reader of the MySQL Performance Blog, go subscribe now. They are also for hire, and I hear they’re most excellent. Pricey, but worth every damn penny.

Got any more tricks? Hook me up in the comments.

Rspec notes from the trenches-2

Courtenay : June 18th, 2007

In followup to the previous article, another way I'm using rspec. This is a variation on the "fat model" idea, but I'm pushing it a little further.

In controllers, we frequently see code like this:

def index
  @people = Person.find(:all, :order => 'id desc', :conditions => ['activated=?', true])
end

def show
  @person = Person.find(:first, :conditions => ['id = ? and activated = ?', params[:id], true])
end

Now, to my eye, that looks like we're leaking commands from the database (which should be hidden away under the model). I even think that in many cases, the 'find' call should be a protected method to the model.

Exposing find and its parameters allows the coder to make database calls from the controller, which makes the spec incredibly brittle and rigid.

def index
  @people = Person.find_all_activated
end

def show
  @person = Person.find_activated( params[:id] )
end

The model will look something like:

def find_activated( person_id )
  find(:first, :conditions => ['id = ? and activated = ?', person_id, true]
end

We can now stub out the method happily. Also, this interface -- User.find_activated -- will never need to change.

it "should render show" do
  User.should_receive(:find_activated).and_return mock_model(User)
  get :show
  response.should be_success
end

You can also make that return a User.new object or whatever mock you want.

implement sweet sparklines with plotr

Courtenay : May 6th, 2007

requires: plotr.

You have a bunch of data you want to plot in a sparkish fashion. Lets say they are sparse, one or two every 4 hours. So, lets pull some data clustered in 4-hour increments and spark it up.

Create the data like thusly:

distance = 4.hours # how much we are clustering the results
@sparks = @store.orders.find(:all, 
  :select => "count(id) as count, round(UNIX_TIMESTAMP(created_at)/#{distance}) as timestamp", 
  :group => "round(UNIX_TIMESTAMP(created_at) / #{distance})")

Install plotr javascripts, include them, then in your view dump the placeholder:

<div><canvas id="spark" width="510" height="50"></canvas></div>

And write some javascript

<% baseline = sparks[0]['timestamp'].to_i %>

<script type="text/javascript">
  var dataset = { 'spark': [ <% @sparks.each do |spark| %>
    [<%= spark['timestamp'].to_i - baseline %>, <%= spark.count %>],<% end %>
    []
  ]};
  var spark = new Plotr.BarChart('spark', { colorScheme:'blue', yNumberOfTicks:0,xNumberOfTicks:0,padding:{left:0,right:30,top:0,bottom:10}});
  spark.addDataset(dataset);
  spark.render();
</script>

Mine looked something like this:

If you're interested in keeping a lot of historical statistical data in rails, watch out! Weed is being rewritten by a crack team of monkeys.. I mean, a team of monkeys on crack. By which I mean me.

The storage of data is now automatically aggregated. This means, speed! For those in the know, I borrowed the idea from rrdtool.

an rrdtool primer

If you've never heard of rrdtool, here's how I decided it works. (note: this may be inaccurate)

Normally, you store a bunch of pageview data like this:

created_at: 01-jan 00:25
ip_address: 12.43.522.343
user_agent: Mozila blah blah

created_at: 01-jan 01:45
ip_address: 12.43.522.343
user_agent: Mozila blah blah

And so on. To build a pretty graph of this data, you have to do a sql grouping statement (in mysql) like

select count(id), concat(year(created_at), '-', dayofyear(created_at))
from hits 
group by year(created_at), dayofyear(created_at)

or in rails like

Hit.find :all, 
  :select => "count(id), concat(year(created_at), '-', dayofyear(created_at))", 
  :group => "year(created_at), dayofyear(created_at)"

which builds up a table something like

    +-----------+----------+
    | count(id) | date     |
    +-----------+----------+
    |        13 | 2005-256 | 
    |       313 | 2005-257 | 
    |       349 | 2005-258 | 
    |       260 | 2005-259 | 
    |       163 | 2005-260 |

Now, this is fine. Until you get to about 300,000 records (which is my test data), and that query takes 400ms. Imagine it now with a few million records and it becomes prohibitive to generate that data on a pageload.

If you think about it, the data doesn't actually change; we can't go back in time and hit those pages again. So we keep around a cached version in a new table, like so:

date: 01-jan-2007
hits: 20

date: 02-jan-2007
hits: 45

That way, the query is a straightforward select, no counts or groups which on my dataset runs at less than one millisecond. Perfect.

Now, let's abstract that out a little. What if, instead of storing daily values, we wanted to store weekly values?

date: 01-jan-2007
timespan: 86400 # one day in seconds
hits: 20

date: 01-jan-2007
timespan: 604800 # one week in seconds
hits: 190

So, provided you're regularly aggregating data, you can easily count a year's worth of data by summing the weeks, not the days (which is 1/52 less work) or months (1/12th as much work). This may not seem much but over millions of records, it matters; and even more so if you're performing work on the value, such as doing Math.log on the hitcount.

Automatically updating the aggregate table

Lets imagine we have an Aggregate model

class Rrd::Aggregate < ActiveRecord::Base

And we want to add a method to cache those counts. On mysql we can use the "insert into .. select ..." sql like this:

def self.update
  connection.execute("
    insert into aggregates (timespan, name, value, time_from)
    select '86400', concat(...), count(hits.id), DATE_FORMAT(created_at, '%Y%m%d000000')
      from hits
      group by dayofyear(created_at), year(created_at)
  ")
end

This populates the aggregates table with the data we want. Now, wrap it up in a class method, and oh! we don't want to count all records every time, only records since the last update.

def self.last_update
  if last = find(:first, :order => 'time_from desc')
    last.time_from
  end
end

The eagle-eyed amongst you may realise that we may have been part-way through a day before, so that just by updating the aggregate table, the stats get skewed. So, my hackish solution is to first extract that date_format string, and then delete all records matching the last_update's time_from. This way we clear out the last day's data and re-evaluate it.

def self.update(domain_id)
  date_format = case @@precision
  when 1.day; "%Y%m%d000000"
  ...
  delete_all(["timespan = ? and name=DATE_FORMAT(?, '#{date_format}')", @@precision, last_update(domain_id)])
  connection.execute("  ...  ")
end

Finally, let's add in some more assumptions; the aggregate table is abstract and uses STI; and there's a domain for stats, so we can grab stats for more than one site.

For the STI, I make sure to define "aggregate_table" in the subclass (in this case, it would be 'hits') and have to do some magic to get the class name.

Note I changed the SQL a little to use a subquery, such that it's less work to define some of the functions. Probably a case of too much DRY, but it feels subjectively better.

For the domain, we just have to pass it around to the methods.

# Table name: aggregates
#
#  id        :integer(11)   not null, primary key
#  timespan  :integer(11)   
#  name      :string(255)   
#  value     :integer(11)   
#  domain_id :integer(11)   
#  time_from :datetime      
#  type      :string(255)   

class Rrd::Aggregate < ActiveRecord::Base
  belongs_to :domain
  @@precision = 1.day

  def self.last_update(domain_id)
    if last = find(:first, :conditions => ['domain_id=?', domain_id], :order => 'time_from desc')
      last.time_from
    end
  end

  # Update the aggregate table with any new data.
  def self.update(domain_id)
    date_format = case @@precision
    when 1.day; "%Y%m%d000000"
    else
      raise "Precision value not supported"
    end

    # delete the last day's records since they may have changed.
    delete_all(["timespan = ? and name=DATE_FORMAT(?, '#{date_format}')", @@precision, last_update(domain_id)])

    connection.execute("
      insert into #{table_name} (timespan, name, value, domain_id, time_from, type)
      select '#{@@precision}', name, count, #{domain_id}, name, '#{self.name.split(":")[-1]}'
      from (
        select DATE_FORMAT(created_at, '#{date_format}') as name, count(#{aggregate_table}.id) as count from #{aggregate_table} 
        where #{aggregate_table}.domain_id = #{domain_id} and #{aggregate_table}.created_at >= '#{last_update(domain_id)}'
        group by name) #{aggregate_table}_2
    ")    
  end
end

class Rrd::AggregateUnique < Rrd::Aggregate
  def self.aggregate_table # verb; this model aggregates ....
    "uniques"
  end
end

class Rrd::AggregatePageView < Rrd::Aggregate
  def self.aggregate_table # verb; this model aggregates ....
    "page_views"
  end
end

how to use it

The domain model (through which everything is queried) looks something like this:

class Domain < ActiveRecord::Base
  has_many :page_views do
    def count(*args)
      @owner.rrd_aggregate_page_views.sum :value, *args
    end
  end
  has_many :uniques do
    def count(*args)
      @owner.rrd_aggregate_uniques.sum :value, *args
    end
  end

  has_many :rrd_aggregate_page_views, :class_name => 'Rrd::AggregatePageView', :conditions => ["timespan = ?", 1.day]
  has_many :rrd_aggregate_uniques,    :class_name => 'Rrd::AggregateUnique', :conditions => ["timespan = ?", 1.day]

I can then create a graph with plotr by sending this data set:

 @domain.rrd_aggregate_uniques.find(:all, { :conditions => ["time_from > ?", days.days.ago] })

Bonus points for anyone who works out what I'm missing from the data set. I'll post more on this in the coming days.

Final question: Do you have a better way to do this?

ANN: Independent migrations plugin

Courtenay : March 27th, 2007

My independent migrations patch has been sitting on the rails trac for 4 months, so I’m releasing it as a plugin.

Code plz thx

ticket: http://dev.rubyonrails.org/ticket/6799
plugin: svn://caboo.se/plugins/court3nay/independent_migrations

I plan to add a generator, one day, if anyone cares.

wtf

In simple terms, here’s what it does. To paraphrase zenspider, if two migrations have the same number, Rails will bitch at you. This plugin removes the bitching. To get it working, install, then edit your migrations so they look like this

class AddProject < ActiveRecord::IndependentMigration

That’s it. Now your migration directory can look something like

002_add_project.rb
002.add_monkeys.rb
003.fix_monkeys.rb

and as long as all 002 are independent, then the migration will run. This patch is only guaranteed for Rails 1.2.3 and may change as ActiveRecord changes as it relies heavily on monkeypatching (or reopening classes, as David Black would have you believe) some private methods.

Why?

If you’re still wondering, “Why?”, the usage came from heavy usage of branches. Basically, if there are two branches being developed in parallel, and both are getting migrations, it can be assumed that those migrations are, in fact, in parallel, and won’t conflict, so can have the same number.

Alternate solutions

There is of course more ways to skin a cat, and the other contender for your migrations is Françoise Beausoleil’s “Timestamped Migrations”, while it isn’t a plugin (yet), numbers your migrations like it says on the box (with timestamps).

http://blog.teksol.info/articles/search?q=migration

simultaneous migrations

Courtenay : December 7th, 2006

If you do a lot of branch work, or mirror a remote repository, you'll always be clashing migration numbers. No more! I've written a patch to migration code so you can have multiple migrations with the same number. This means that these migrations can effectively be run in parallel, so long as all the migrations with that number have the IndependentMigration class. There are a number of caveats, but nothing too crazy. 1. you're at #48 and you merge in a branch where there are two #32. Your future migrations fail since not all #32 are Independent Migrations. Edit the offending #32, migrate down, migrate back up. You'll need to temporarily comment out your own #32 and the existing #32 down method, so you can run them properly. This is painful, but too much magic means you destroy your database. 2. you're at #55 and someone merges another #55 into your branch. Easy, just change the class of both files to IndependentMigration. The patch is here: http://dev.rubyonrails.org/ticket/6799

hacking at associations in role-based systems

Courtenay : November 19th, 2006

Your application, ContrivedExample (beta!), has a variety of roles, one being administrator. Your projects are associated with one user (the owner of the project). Users can only view their own projects; however, the admin can view all projects. You want to use the simplest interface possible, the standard has_many association, so you can paginate, and keep the logic in your models. First, a test case

class UserTest < Test::Unit::TestCase

  def test_user_project_association_exists
    user = User.find(5)
    assert user.respond_to?(:projects) 
  end
  
  def test_project_count_per_role
    project_count = Project.count

    user = NormalUser.find :first
    assert user.is_a?User

    # normal users should not have access to the entire set of projects
    assert_not_equal project_count, user.projects.size

    # admin users should be able to view all projects
    user = AdminUser.find :first
    assert user.is_a?User
    assert_equal project_count, user.projects.size
  end
end
What does this test suite tell us? First, users all inherit from User; second, user/projects have a common interface, but the result count differs per-role. How do we achieve this simply?

class User < ActiveRecord::Base
end

class NormalUser < User # STI
  has_many :projects
end

user.projects.find(:all, :limit => 5, :offset => params[:page])

class Admin < User
  ???
end
The easy solution is to define some finder sql and extend the association's methods with the splat.

class Admin < User
  has_many :projects, :finder_sql => 'select * from projects' do 
    def find(*args)
      Project.find(*args)
    end
  end
end  

user = User.find(3)
user.projects # success!
For homework, create a proc that you just :extend, so it can be shared around like a cheap floozy.
Here are a few 'rails' ways to optimize some of the queries in your application. Remember: optimize last! Say you have a topic containing many posts, or a category with many articles, or a user with many orders, and you want to be able to show a summary something like this:

| topic            | last post   | number of posts |
|------------------|-------------|-----------------|
| Monkeys are cool | #12 by Joe  | 25              |
| I like bananas   | #33 by Fred | 8               |
|------------------|-------------|-----------------|
The data model for this schema looks something like

class Topic < ActiveRecord::Base
  has_many :posts
end
The above table is going to be relatively expensive in terms of database queries, since we need to perform a number of actions # load all the topics # load the posts, find the last one # count the posts Luckily there are a number of optimizations, some built into rails. h2. the counter cache You're probably familiar with the counter cache, but here's a refresher. First, add a 'posts_count' integer field to Topic; then in the post model, we set up

# post.rb
class Post < ActiveRecord::Base
  belongs_to :topic, :counter_cache => true
end
Counter cache is quite interesting, in that it saves the number (count) of associated records so you don't have to look it up. Taking a look at the source to 'belongs_to' reveals some interesting metaprogramming. I've added some comments inline.

# File vendor/rails/activerecord/lib/active_record/associations.rb, line 707

707:  if options[:counter_cache]

# Sets up the count column. You can specify your own, or have it automagic.
708:    cache_column = options[:counter_cache] == true ?
709:      "#{self.to_s.underscore.pluralize}_count" :
710:      options[:counter_cache]
711: 

# Create an after_create filter on the model that automatically increments the counter on the other table
712:    module_eval(
713:      "after_create '#{reflection.name}.class.increment_counter(\"#{cache_column}\", #{reflection.primary_key_name})" +
714:      " unless #{reflection.name}.nil?'"
715:    )
716: 

# Set up a before_destroy filter so that we decrement the associated counter, too.
717:    module_eval(
718:      "before_destroy '#{reflection.name}.class.decrement_counter(\"#{cache_column}\", #{reflection.primary_key_name})" +
719:      " unless #{reflection.name}.nil?'"
720:    )          
721:  end
Remember, calling "before_destroy " just evals the string. If you were to hand-code this, it'd look something like:

class Post < ActiveRecord::Base
  belongs_to :topic
  after_create :increment_topic_counter
  
  def after_create 
    Topic.increment_counter('topic_count', topic_id)
  end
end
For kicks, look up the source to increment_counter and you'll find it does a quick update_all, which is my favorite way of executing some fast-running SQL.

# File vendor/rails/activerecord/lib/active_record/base.rb, line 525
525:       def increment_counter(counter_name, id)
526:         update_all "#{counter_name} = #{counter_name} + 1", "#{primary_key} = #{quote_value(id)}"
527:       end
So, what does this mean? Firstly, the after_create / before_destroy callbacks are a powerful way to implement association-based caching. This is particularly useful when you are read-optimizing (more reads, less writes). It slows down the write process (one extra query on inserts) but speeds up the reads. Secondly, metaprogramming with module_eval is a good way to hide common functionality. h2. extending the counter cache idea To retrieve the last post in a topic, as in the table above, there are a number of methods. # Eager loading.

    Topic.find(:all, :include => :posts) 
  
This will load all of the post objects as well as topics. This is fine for a small forum, but definitely won't scale. You could write some custom sql, eager loading just the last post, but that will be difficult to maintain (who wants to get their mitts on someone else's sql!?) # Load the posts and group-by.
Post.find(:all, :group => 'topic_id', :order => 'created_at desc', :include => 'post') 
This is a fun query, since it'll probably only work on mysql, (hah! eat it postgres!) -- we're flipping around the semantics of the query, performing a find on posts instead of topics, with group-by on the foreign key, and eager loading. Empty topics won't be loaded. If you want to load topics too, you can start joining. (yuk, again) # Extend counter cache Add a 'last_post_id' to the topic model, and ..

    class Post < ActiveRecord::Base
      belongs_to :topic, :counter_cache => true
      def after_create 
        topic.update_attribute(:last_post_id, id)
      end
    end
    
    class Topic < ActiveRecord::Base
      has_many :posts
      has_one :last_post
    end
  
Now when you Topic.find(:all, :include => :last_post) and you'll have that last post without any extra queries. Note: before_destroy requires some trickiness, because you'll need to find the most recent post before the current one; and you'll also need to check whether the current post is the last one attached to the topic. acts_as_list is a good candidate here. Sounds like a good idea for a plugin which extends has_many/belongs_to!
Mmm.. lunch! They're not accepting new features into rails until after 1.2 is out, so I pluginized the awesome fixtures hack that lets you do

id: 1
  user: :joe
  name: stuff
in your fixtures. Give it a twirl and tell me if it works..! Remember you have to handle dependencies yourself, so fixtures :users, :products, :monkeys if they all rely on each other

script/plugin install svn://caboo.se/plugins/court3nay/awesome_fixtures
This is all viewable on the caboose svn repo ebola collaboa. If you're a facebook 'booser, login and it (should) give you ticketing access. It's a nasty hack that does the grunt work. Let me know if you have a plugin you want caboose svn hosted and I'll give you some authz.

Are Foreign Keys Worth Your Time?

Liquid error: undefined method `login' for nil:NilClass : May 1st, 2006

I currently work on a project supporting both Oracle and Postgres. One thing that was suggested to me when I arrived was that we continue to leverage foreign keys like they had in the past. I found a "foreign key plugin":http://wiki.rubyonrails.org/rails/pages/Foreign+Key+Schema+Dumper+Plugin on the "rails wiki":http://wiki.rubyonrails.org and had to hack it to support Oracle. This did a decent job of representing our foreign keys and it even allowed us to continue using the :ruby schema format. However you face a big problem with respect to tests and the order that fixtures are loaded into your tests db. Foreign keys are cool and all, but I won't be using them any time soon. Read the rest of this entry
Do you have a massive list of items in a scaffold-type arrangement (e.g. in a typo-style admin)? Here's a way to quickly and easily filter and sort your big-assed list. Note: this requires the Hash#to_sql function posted here a few weeks ago .. told you it was useful! More after the jump.. Read the rest of this entry

"your site doesn't look too good in (x) browser"

Courtenay : November 12th, 2005

I'm trying to actively discourage you from actually visitng the site. You're supposed to be reading it via rss. In other news,

class Array
  def smoosh
    self.flatten.compact.uniq
  end
end
lets you do

@somevar = [@somevar, stuff, [other, stuff]]
in lots of different views so you can dynamically build a list of stylesheets or included javascripts then smoosh it all together later for a nice flat list. And while we're extending ruby,

class Hash
  def to_sql
    sql = keys.sort {|a,b| a.to_s<=>b.to_s}.inject([[]]) do |arr, key|
      arr[0] << "#{key} = ?"
      arr << self[key]
    end
    [sql[0].join(' AND ')] + sql[1..-1]
  end
end

conditions = { :somekey => 'value',  :someotherkey => 'othervalue }

find(:all, :conditions => conditions.to_sql)
This lets you do all kinds of dynamic sql jiggery-pokery.

"your site doesn't look too good in (x) browser"

Courtenay : November 12th, 2005

I'm trying to actively discourage you from actually visitng the site. You're supposed to be reading it via rss. In other news,

class Array
  def smoosh
    self.flatten.compact.uniq
  end
end
lets you do

@somevar = [@somevar, stuff, [other, stuff]]
in lots of different views so you can dynamically build a list of stylesheets or included javascripts then smoosh it all together later for a nice flat list. And while we're extending ruby,

class Hash
  def to_sql
    sql = keys.sort {|a,b| a.to_s<=>b.to_s}.inject([[]]) do |arr, key|
      arr[0] << "#{key} = ?"
      arr << self[key]
    end
    [sql[0].join(' AND ')] + sql[1..-1]
  end
end

conditions = { :somekey => 'value',  :someotherkey => 'othervalue }

find(:all, :conditions => conditions.to_sql)
This lets you do all kinds of dynamic sql jiggery-pokery.