A simple Rails slow-query logger
Courtenay : September 29th, 2008
A few years ago I wrote a simple addition to ActiveRecord that does two things: it chops out the eager loading "t1_t2 AS foo", and it shows the number of records returned for every query you run against the database. You can view the file here
Today I was profiling a site and wanted to quickly find the slow database queries, but didn't have access to mysql's config directly, so I patched that file above to record all queries over 500ms and save it to a log file. I'll warn you now, it ain't pretty, but it works pretty well.
Here's how it works: First, throw this in a file in config/initializers. I open up the rails abstract adapter
module ActiveRecord
module ConnectionAdapters
class AbstractAdapter
And add in a new logger.
def slow_query; 0.5; end # number of seconds
def slow_query_logger
@slow_query_logger ||= Logger.new("log/slow_queries.log")
end
Ideally of course this would all be configurable.
Next, I copy the logging code out of the latest ActiveRecord, and patch it to return the number of records. This is a bit of a hack, too, but we can either look at "num_rows" from the resultset or the actual size of an array.
s = result && (result.respond_to?(:num_rows) ? result.num_rows : \
(result.respond_to?(:size) ? result.size : 0)) || 0
Finally, I rewrite the actual log method so that it checks the benchmark against our threshold
def log_info(sql, name, runtime, result_size = 0)
if runtime > slow_query && slow_query_logger
slow_query_logger.debug "Slow query: (#{runtime}) [#{result_size}] #{sql}"
end
And add the number of results to the regular rails log, while snipping out the annoying eager-loading code.
if @logger && @logger.debug?
if name =~ /Load Including Associations$/
sql = sql.scan(/SELECT /).to_s + ' ...<snip>... ' + sql.scan(/(FROM .*)$/).to_s
end
name = "#{name.nil? ? "SQL" : name} (#{sprintf("%f", runtime)}) [#{result_size.to_i}]"
@logger.debug format_log_entry(name, sql.squeeze(' '))
end
end
Here's the full file.
module ActiveRecord
module ConnectionAdapters # :nodoc:
class AbstractAdapter
protected
# todo: config this
def slow_query; 0.5; end
def slow_query_logger
@slow_query_logger ||= Logger.new("log/slow_queries.log")
end
alias_method :old_log, :log
def log(sql, name, &block)
if block_given?
#if @logger and @logger.level <= Logger::INFO
result = nil
seconds = Benchmark.realtime { result = yield }
@runtime += seconds
s = result && (result.respond_to?(:num_rows) ? result.num_rows : \
(result.respond_to?(:size) ? result.size : 0)) || 0
log_info(sql, name, seconds, s)
return result
#end
else
log_info(sql, name, 0, 0)
nil
end
# old_log(sql, name) { yield }
rescue Exception => e
@last_verification = 0
message = "#{e.class.name}: #{e.message}: #{sql}"
log_info(message, name, 0)
raise ActiveRecord::StatementInvalid, message
end
alias_method :old_log_info, :log_info
def log_info(sql, name, runtime, result_size = 0)
if runtime > slow_query && slow_query_logger
slow_query_logger.debug "Slow query: (#{runtime}) [#{result_size}] #{sql}"
end
if @logger && @logger.debug?
if name =~ /Load Including Associations$/
sql = sql.scan(/SELECT /).to_s + ' ...<snip>... ' + sql.scan(/(FROM .*)$/).to_s
end
name = "#{name.nil? ? "SQL" : name} (#{sprintf("%f", runtime)}) [#{result_size.to_i}]"
@logger.debug format_log_entry(name, sql.squeeze(' '))
end
end
end
end
end
Would this work as a plugin? As a patch to Rails itself? Or did somebody else already implement a cross-platform slow query logger?
The awesomest filter and sort ever
Courtenay : August 26th, 2008
Update 2: seems like only one or two people knew about what can_search does :) I hope we’re all a little better educated.
Update: yes, I’m using these named scopes throughout the app in other places – they aren’t used only in this one controller.
Often you have an index action where you want to sort records, filter by a parameter, and maybe join on some other tables to get a result.
Let’s say you’re looking at a videos controller (where videos are acts_as_taggable) and you want to filter by user_id, filter by tag name, order by video title, or rating.
Maybe later, you’ll add a roles (hm:t) association and need to only show videos viewable by a certain user. How complex!
To solve this, we’re going to play with some things you may know, and finish up with a bam! pow! that’ll take your breath away.
Rather than build up some form of frankenquery with all sorts of conditionals and cases, joins, and other messing about, let’s use a brand-new bleeding edge feature of Rails: named scopes.
First, build up individual named scopes for each axis on which you wish to filter. Make sure and put the table name in that query.
named_scope :by_user, lambda { |user_id|
{ :conditions => ['videos.user_id = ?', user_id] }
}
named_scope :tag_name, lambda { |tag_name|
{ :joins => { :taggable => :tag },
{ :conditions => ['tags.name = ?', tag] }
}
named_scope :rating, lambda { |rating|
{ :conditions => ['ratings_count > ?', rating] }
}
OK, I cheated on the last one, but let’s assume you have a counter_cache on ratings count.
Now, if you have more than one scope with joins in it, you’ll need to apply this patch to your rails installation, or upgrade past 2.1.1. This will allow you to have as many joins as you like in your scopes.
Now, here’s where the magic happens: in the controller. Big shout out to protocool for this method.
Let’s build up a set of all the possible scopes that we might want to use, in an array form like [ named_scope, argument ]
def index
scopes = []
scopes << [ :by_user, params[:user_id] ] if params[:user_id]
scopes << [ :tag_name, params[:tag_name] ] if params[:tag_name]
scopes << [ :rating, params[:rating] ] if params[:rating]
end
Easy, right? Very readable.
How about some ordering?
order = { 'name' : 'videos.name ASC' }[params[:order]] || 'videos.id DESC'
Now, as you know, you can chain named scopes. So you could say Video.by_user(2).tag_name('monkeys') Let's take advantage of this, building up a chain of scopes dynamically using 'inject', starting from Video, and adding each scope we added to the array above. This is really fun magic, because it doesn't run any of the queries until the whole thing is built. I don't even know how this works, but it does. Swimmingly.
@videos = scopes.inject(Video) {|m,v| m.scopes[v[0]].call(m, v[1]) }.paginate(:all, :order => order)
The final method looks like this:
def index
scopes = []
scopes << [ :by_user, params[:user_id] ] if params[:user_id]
scopes << [ :tag_name, params[:tag_name] ] if params[:tag_name]
scopes << [ :rating, params[:rating] ] if params[:rating]
order = { 'name' : 'videos.name ASC' }[params[:order]] || 'videos.id DESC'
@videos = scopes.inject(Video) {|m,v| m.scopes[v[0]].call(m, v[1]) }.paginate(:all, :order => order, :page => params[:page])
end
One final caveat. Sometimes :joins doesn’t know where to get the video id from, so if you’re using id in your app, you’ll need a slight workaround involving manually getting the pagination count, and forcing :select => ‘distinct videos.*’ in the paginate call.
If this works for you, it’s really easy to add new filtering, ordering, or even scoping to your query. For example, you can add some form of role hackery to your video
named_scope :viewable_by, lambda { |user|
{ :joins => { :permissions => :roles },
:conditions => [ "roles.user_id = ? AND permissions.role = ?", user.id, "view"
}
Controller, you replace the first scope definition with this
scopes = [ :viewable_by, current_user ]
Or, you modify the scope inject statement
@videos = scopes.inject(Video.viewable_by(current_user)) { |m,v| ... }
If you consider this a giant hack, you’re probably at least partly right. However, the alternative in building up a complex query with many possible moving parts is just hideous. And consider this: you can unit test each part of the query on its own, in the model specs.
Scale rails from one box to three, four and five
Courtenay : July 29th, 2007
Taking a vanilla rails application from one box and up is a fun process. The exact path you’ll take depends on the nature of your data, and the ratio of database reads to writes. I’m going to cover some of the more common use cases. If you don’t want to get your hands dirty and it’s kind of an emergency, look at stage zero, then skip to the end where I tell you who you can just pay to fix it.
The path you’ll take also depends on how much money you have to play with, and how quickly your site is growing. For example, if you’re sitting on a mountain of cash, and the facebook users are coming in like lemmings, then you can just throw hardware at it. However, if things are tight, and it’s a nice linear growth curve, then you can play around with caching.
Let’s assume you have a slice or VPS (if you’re on shared hosting, the first step should be to get a dedicated box or at least a xen instance).
Stage zero: fix any ‘duh’ errors
Make sure you’re on a database that can handle the load. This doesn’t include sqlite. I’m going to suggest MySQL in this article, because it’s where I have the most experience.
Make sure you’re not serving up static files through mongrel. This will happen if you are proxying everything through the webserver.
Upgrade your webserver to something like nginx. Alternatively, you might use pound as a load balancer, pointing dynamic requests at mongrels, and static requests at lighttpd. (Interested? I can write an article on this. Let me know.)
Move off that $20/month shared box and get your own server. You can lease a phat server in a data center on 100mbit pipe for $100/month. If you want to colo, I recommend Corporate Colo in Los Angeles.
Move any slow actions into a dedicated process. For example, you have some code that takes 4 seconds to update a bunch of tables? You probably want to fire an asynchronous event to a BackgrounDRB process that handles this exclusively.
Move your uploads to a dedicated merb cluster – it’s like a cut-down rails with less magic and more speed.
Stage one: clean up your database
Take a look at your logs – are you performing over 10 database calls per request? You need to fix this. Are you performing over 90? You’re a dumbass. (yes, even I am guilty of this).
Generally you can reduce the number of requests by denormalizing; for example, you have a list of users and a count of how many comments they’ve made.
<% @users.each do |user| %>
<%=h user.name %> (<%= user.comments.count %>)
<% end %>
You’re performing a “COUNT” for every single user, every time the page loads. Yuk! This is a “read-optimizable” situation, since there are many READS for each WRITE (comments don’t get created that often).
Add a counter_cache to the comments belongs_to :user association and change this to
<%=h user.name %> (<%= user.comments_count %>)
You can do this in other situations where you’re chaining through associations.
Your task is <%=h @list_item.task.name %>.
This call has to find the list_item task, and then grab the name. You can fix this by either adding :include to the ListItem.find, or, you can denormalize the task name.
Include isn’t always an option, and can be slow. Nothing’s faster than denormalizing. Add a “taskname” to the listitem model.
Your task is <%=h @list_item.task_name %>.
Then make sure to update that field if the task gets updated.
class Task < ActiveRecord::Base
has_many :list_items
after_save :update_list_item_names
def update_list_item_names
list_items.each { |li| li.update_attribute(:name, self.name) }
end
end
Yes, there are faster ways of doing this, and yes, I should probably wrap that in a transaction. But you get the point. (note to self: if rails had dirty-field checking, this would be much better)
Stage two: cache the hell out of it
Next thing you want to do: caching. If you haven't already, install memcached, use the cache-fu plugin, and start saving the results of long-running or frequent queries into the cache. Set the TTL (timeout) at about 5 minutes; that way you won't need to write any expiry code (it's lazy, but you're busy!) You'll immediately notice a drop in load. If you have time, write some cache-expiring observers and up the TTL to 15 minutes or even an hour.
Eventually you want to have memcached sitting between your application and the database. Most of your database calls' results will be stored for at least 5 minutes, and maybe forever, in memcached.
If you can, add some action caching. Action cache is like page caching, but it runs any filters you may have. Action caching isn't always easy, particularly if you have "current_user" dependent code in your views. I have a solution for this which I'll be releasing soon, but in the mean time, you may not be able to action cache. Any action-cached pages will be vastly beneficial to your load, and combining memcached with action caches means that you can virtually eliminate any database slowness and is almost as good as the page cache.
If you can action cache, then you can probably page cache. A page-cached site will get you about 3,000 requests per second, thereabouts, and a simple GET request won't even hit your application; you're serving raw html through the webserver. You will soon start thinking of rails as an HTML generator, rather than an app server.
However, all these caching measures won't hide a basic problem: you are performing lots of database queries, and it's harshing your mellow.
Stage three: move the database to another server
This should be fairly painless. Get yourself a fat database server. By fat I mean, super-fast disks, plenty of RAM, and the fastest networking you can afford.
Set it up so that it's only accessible from your main box, which will now be known as the app server. Point your database.yml at the IP of the database server.
Now your app server has much less load, so you can increase the number of mongrels. Add some more RAM to the app server box, too, if you can.
How many mongrels?
Here's a simple formula to follow.
A. Take the (average or median) request time, in seconds. Say, 0.250 seconds (250ms)
B. How many requests do you want to handle at peak? (e.g. 10,000 a minute, 166 a second)
C. Multiply A x B : 0.250 * 166 = 41.5
So you need about 40 mongrels to handle the load. At about 60MB per listener, that's 2.4GB of RAM, plus a bit of room for leakiness and swapping. Ezra at Engine Yard suggests "about 10 dogs per CPU core", which means that if we have a 4-core opteron box with 3GB of RAM, then this is possible on one box.
Your mileage will vary, which means, if the box is lagging, remove a few mongrels.
Stage four: add more servers as necessary
Here's where it gets interesting. Which of your servers has the most load?
If it's the app server, then setup another box as an exact copy. Now you have app1 and app2. You will need to load balance between app1 and app2. You can do this with a hardware load balancer, or you can use pound on app1 to balance to listeners on app2. (You'll have a single point of failure on app1 if you do it this way)
If the db server is the most heavily loaded box, things start getting interesting: you'll either need some kind of replication, or you'll need to shard (partition) your data.
Replication vs Sharding
Take a look at the data in your application. If it were a person, would it be "extroverted" or "introverted"? That is, could you split the data into many sections (no friends, introverted), or is it all cross-linked (lots of friends)?
For example, you are hosting subversion repositories. You can easily send half the records to one database and half to the other. Or, you host thousands of social networks, each with about 50 users (collectivex, I’m looking at you.)
In this case, one database box would handle all users with names A–L, and another box from M–Z.
If you have a social networking site where anyone can be friends with anyone else, you’re going to have difficulty partitioning the data. (Astute readers will instantly think about denormalizing to make this still possible).
If you have one shared table (users) but the rest of the data can be sharded, then you will want some bastard stepchild method.
Replication: Master-Slave
So, replication (MySQL only from here on). Let’s say you have a few writes (inserts, updates) and a lot of SELECTs. Most people are just viewing things, not updating records. This is fun and easy.
You set up one database box as the “master”. This box will behave as normal. You can read and write data as before.
You then set up as many “slave” boxes as you like. These boxes will be read-only, but because you have a large amount of reads, then much of the load can be pushed out to these slaves. You’ll need to hack at your rails app to direct simple reads at a slave DB. Luckily, someone’s already done the work and called it acts_as_readonlyable.
The problem here is that the slaves will always be lagging, depending on load. Under light load, they may only be 100ms behind. Under heavy load, you can’t be guaranteed of any sort of synchronization. In this case, you’ll want to use memcached heavily. Here’s some cache-fu code.
class Category < ActiveRecord::Base
acts_as_cached
def after_save
Category.set_cache(id, self)
end
end
When you save the category, it pushes the record (self) into memcached. That means, with a long TTL, you'll never need to do a simple “find” on category from the database, and replication lag won’t matter.
Finally, you’ll want to load-balance to the database servers, an exercise left to the reader.
/ write==[db1] master
[app1]
\ /==[db2] slave
\ read==LB==[db3] slave
\==[db4] slave
Replication: Simple Master-Master
In this situation, you have two sets of stacks. Each stack has an app box and a database box. They are almost identical; the app server is wired to one database server. There is no crossover. ASCII-tabulous diagram:
/[app1]====[db1] master+slave
LB | replicate
[app2]====[db2] master+slave
Both databases in this case are masters. That means, both act as masters, but both act as slaves as well. You can even set up the boxes so that if one goes down, the other fails over and takes on both IP addresses.
Because the setup is asynchronous, you need to assign each database a separate set of autoincrement keys. DB1 will increment values like 11, 21, 31, 41, 51, and DB2 ids will increment like 12, 22, 32, 42, 52. You set these with auto_increment_increment and auto_increment_offset.
Take care! If you have a UNIQUE index on other, non-auto-increment fields, you need to make sure that the same database will be used for CREATEs. You’ll need some algorithm, such as checking the final character or number of the unique field. You’ll also need some way of redirecting writes to a specific database, as well as dealing with load balancers. You may find MySQL Proxy useful here – you can use Lua to control the load-balancing at the db layer.
Master:Master replication doesn’t really scale past 10 boxes, because the databases will be so busy updating that they won’t be able to serve requests. However, 99% of rails applications won’t get to this stage.
And remember – there will be some replication lag between the boxes, so your code will need to be tolerant of this issue.
Stage Six: More boxes!
At this stage, you should have most of your data stored in memcached, and it’s time to get yourself a dedicated memcached box with gigabit networking and a metric crapton of RAM.
Your data should be nicely segmented (sharded, or partitioned) into separate databases.
Stage Seven: You’re Going To Need Help
If you’ve roughly followed all of the above steps, and your site is still lagging, you either didn’t follow the instructions, or you’re beyond this and need to bring in some experts. Replicating and sharding should cover most people’s scaling needs, such that you just keep adding stacks of app+db and expanding the memcached cluster.
You can hire skilled rails consultants to clean up your code (there are plenty of #caboosers with the requisite experience), or you can use a hosting service like Engine Yard (staffed almost exclusively with caboosers) where they will have your app running on a cluster pretty much like I’ve described above, only bigger and faster. It’s going to cost you, but you get what you pay for. Hell, they even deploy your app for you.
Developers, if you’re not an avid reader of the MySQL Performance Blog, go subscribe now. They are also for hire, and I hear they’re most excellent. Pricey, but worth every damn penny.
Got any more tricks? Hook me up in the comments.
Rspec notes from the trenches-2
Courtenay : June 18th, 2007
In followup to the previous article, another way I'm using rspec. This is a variation on the "fat model" idea, but I'm pushing it a little further.
In controllers, we frequently see code like this:
def index
@people = Person.find(:all, :order => 'id desc', :conditions => ['activated=?', true])
end
def show
@person = Person.find(:first, :conditions => ['id = ? and activated = ?', params[:id], true])
end
Now, to my eye, that looks like we're leaking commands from the database (which should be hidden away under the model). I even think that in many cases, the 'find' call should be a protected method to the model.
Exposing find and its parameters allows the coder to make database calls from the controller, which makes the spec incredibly brittle and rigid.
def index
@people = Person.find_all_activated
end
def show
@person = Person.find_activated( params[:id] )
end
The model will look something like:
def find_activated( person_id )
find(:first, :conditions => ['id = ? and activated = ?', person_id, true]
end
We can now stub out the method happily. Also, this interface -- User.find_activated -- will never need to change.
it "should render show" do
User.should_receive(:find_activated).and_return mock_model(User)
get :show
response.should be_success
end
You can also make that return a User.new object or whatever mock you want.
implement sweet sparklines with plotr
Courtenay : May 6th, 2007
requires: plotr.
You have a bunch of data you want to plot in a sparkish fashion. Lets say they are sparse, one or two every 4 hours. So, lets pull some data clustered in 4-hour increments and spark it up.
Create the data like thusly:
distance = 4.hours # how much we are clustering the results
@sparks = @store.orders.find(:all,
:select => "count(id) as count, round(UNIX_TIMESTAMP(created_at)/#{distance}) as timestamp",
:group => "round(UNIX_TIMESTAMP(created_at) / #{distance})")
Install plotr javascripts, include them, then in your view dump the placeholder:
<div><canvas id="spark" width="510" height="50"></canvas></div>
And write some javascript
<% baseline = sparks[0]['timestamp'].to_i %>
<script type="text/javascript">
var dataset = { 'spark': [ <% @sparks.each do |spark| %>
[<%= spark['timestamp'].to_i - baseline %>, <%= spark.count %>],<% end %>
[]
]};
var spark = new Plotr.BarChart('spark', { colorScheme:'blue', yNumberOfTicks:0,xNumberOfTicks:0,padding:{left:0,right:30,top:0,bottom:10}});
spark.addDataset(dataset);
spark.render();
</script>
Mine looked something like this:

weed v2 is coming.. statistical aggregates on rails
Courtenay : April 15th, 2007
If you're interested in keeping a lot of historical statistical data in rails, watch out! Weed is being rewritten by a crack team of monkeys.. I mean, a team of monkeys on crack. By which I mean me.
The storage of data is now automatically aggregated. This means, speed! For those in the know, I borrowed the idea from rrdtool.
an rrdtool primer
If you've never heard of rrdtool, here's how I decided it works. (note: this may be inaccurate)
Normally, you store a bunch of pageview data like this:
created_at: 01-jan 00:25
ip_address: 12.43.522.343
user_agent: Mozila blah blah
created_at: 01-jan 01:45
ip_address: 12.43.522.343
user_agent: Mozila blah blah
And so on. To build a pretty graph of this data, you have to do a sql grouping statement (in mysql) like
select count(id), concat(year(created_at), '-', dayofyear(created_at))
from hits
group by year(created_at), dayofyear(created_at)
or in rails like
Hit.find :all,
:select => "count(id), concat(year(created_at), '-', dayofyear(created_at))",
:group => "year(created_at), dayofyear(created_at)"
which builds up a table something like
+-----------+----------+
| count(id) | date |
+-----------+----------+
| 13 | 2005-256 |
| 313 | 2005-257 |
| 349 | 2005-258 |
| 260 | 2005-259 |
| 163 | 2005-260 |
Now, this is fine. Until you get to about 300,000 records (which is my test data), and that query takes 400ms. Imagine it now with a few million records and it becomes prohibitive to generate that data on a pageload.
If you think about it, the data doesn't actually change; we can't go back in time and hit those pages again. So we keep around a cached version in a new table, like so:
date: 01-jan-2007
hits: 20
date: 02-jan-2007
hits: 45
That way, the query is a straightforward select, no counts or groups which on my dataset runs at less than one millisecond. Perfect.
Now, let's abstract that out a little. What if, instead of storing daily values, we wanted to store weekly values?
date: 01-jan-2007
timespan: 86400 # one day in seconds
hits: 20
date: 01-jan-2007
timespan: 604800 # one week in seconds
hits: 190
So, provided you're regularly aggregating data, you can easily count a year's worth of data by summing the weeks, not the days (which is 1/52 less work) or months (1/12th as much work). This may not seem much but over millions of records, it matters; and even more so if you're performing work on the value, such as doing Math.log on the hitcount.
Automatically updating the aggregate table
Lets imagine we have an Aggregate model
class Rrd::Aggregate < ActiveRecord::Base
And we want to add a method to cache those counts. On mysql we can use the "insert into .. select ..." sql like this:
def self.update
connection.execute("
insert into aggregates (timespan, name, value, time_from)
select '86400', concat(...), count(hits.id), DATE_FORMAT(created_at, '%Y%m%d000000')
from hits
group by dayofyear(created_at), year(created_at)
")
end
This populates the aggregates table with the data we want. Now, wrap it up in a class method, and oh! we don't want to count all records every time, only records since the last update.
def self.last_update
if last = find(:first, :order => 'time_from desc')
last.time_from
end
end
The eagle-eyed amongst you may realise that we may have been part-way through a day before, so that just by updating the aggregate table, the stats get skewed. So, my hackish solution is to first extract that date_format string, and then delete all records matching the last_update's time_from. This way we clear out the last day's data and re-evaluate it.
def self.update(domain_id)
date_format = case @@precision
when 1.day; "%Y%m%d000000"
...
delete_all(["timespan = ? and name=DATE_FORMAT(?, '#{date_format}')", @@precision, last_update(domain_id)])
connection.execute(" ... ")
end
Finally, let's add in some more assumptions; the aggregate table is abstract and uses STI; and there's a domain for stats, so we can grab stats for more than one site.
For the STI, I make sure to define "aggregate_table" in the subclass (in this case, it would be 'hits') and have to do some magic to get the class name.
Note I changed the SQL a little to use a subquery, such that it's less work to define some of the functions. Probably a case of too much DRY, but it feels subjectively better.
For the domain, we just have to pass it around to the methods.
# Table name: aggregates
#
# id :integer(11) not null, primary key
# timespan :integer(11)
# name :string(255)
# value :integer(11)
# domain_id :integer(11)
# time_from :datetime
# type :string(255)
class Rrd::Aggregate < ActiveRecord::Base
belongs_to :domain
@@precision = 1.day
def self.last_update(domain_id)
if last = find(:first, :conditions => ['domain_id=?', domain_id], :order => 'time_from desc')
last.time_from
end
end
# Update the aggregate table with any new data.
def self.update(domain_id)
date_format = case @@precision
when 1.day; "%Y%m%d000000"
else
raise "Precision value not supported"
end
# delete the last day's records since they may have changed.
delete_all(["timespan = ? and name=DATE_FORMAT(?, '#{date_format}')", @@precision, last_update(domain_id)])
connection.execute("
insert into #{table_name} (timespan, name, value, domain_id, time_from, type)
select '#{@@precision}', name, count, #{domain_id}, name, '#{self.name.split(":")[-1]}'
from (
select DATE_FORMAT(created_at, '#{date_format}') as name, count(#{aggregate_table}.id) as count from #{aggregate_table}
where #{aggregate_table}.domain_id = #{domain_id} and #{aggregate_table}.created_at >= '#{last_update(domain_id)}'
group by name) #{aggregate_table}_2
")
end
end
class Rrd::AggregateUnique < Rrd::Aggregate
def self.aggregate_table # verb; this model aggregates ....
"uniques"
end
end
class Rrd::AggregatePageView < Rrd::Aggregate
def self.aggregate_table # verb; this model aggregates ....
"page_views"
end
end
how to use it
The domain model (through which everything is queried) looks something like this:
class Domain < ActiveRecord::Base
has_many :page_views do
def count(*args)
@owner.rrd_aggregate_page_views.sum :value, *args
end
end
has_many :uniques do
def count(*args)
@owner.rrd_aggregate_uniques.sum :value, *args
end
end
has_many :rrd_aggregate_page_views, :class_name => 'Rrd::AggregatePageView', :conditions => ["timespan = ?", 1.day]
has_many :rrd_aggregate_uniques, :class_name => 'Rrd::AggregateUnique', :conditions => ["timespan = ?", 1.day]
I can then create a graph with plotr by sending this data set:
@domain.rrd_aggregate_uniques.find(:all, { :conditions => ["time_from > ?", days.days.ago] })
Bonus points for anyone who works out what I'm missing from the data set. I'll post more on this in the coming days.
Final question: Do you have a better way to do this?
ANN: Independent migrations plugin
Courtenay : March 27th, 2007
My independent migrations patch has been sitting on the rails trac for 4 months, so I’m releasing it as a plugin.
Code plz thx
ticket: http://dev.rubyonrails.org/ticket/6799
plugin: svn://caboo.se/plugins/court3nay/independent_migrations
I plan to add a generator, one day, if anyone cares.
wtf
In simple terms, here’s what it does. To paraphrase zenspider, if two migrations have the same number, Rails will bitch at you. This plugin removes the bitching. To get it working, install, then edit your migrations so they look like this
class AddProject < ActiveRecord::IndependentMigration
That’s it. Now your migration directory can look something like
002_add_project.rb
002.add_monkeys.rb
003.fix_monkeys.rb
and as long as all 002 are independent, then the migration will run. This patch is only guaranteed for Rails 1.2.3 and may change as ActiveRecord changes as it relies heavily on monkeypatching (or reopening classes, as David Black would have you believe) some private methods.
Why?
If you’re still wondering, “Why?”, the usage came from heavy usage of branches. Basically, if there are two branches being developed in parallel, and both are getting migrations, it can be assumed that those migrations are, in fact, in parallel, and won’t conflict, so can have the same number.
Alternate solutions
There is of course more ways to skin a cat, and the other contender for your migrations is Françoise Beausoleil’s “Timestamped Migrations”, while it isn’t a plugin (yet), numbers your migrations like it says on the box (with timestamps).
http://blog.teksol.info/articles/search?q=migration
simultaneous migrations
Courtenay : December 7th, 2006
hacking at associations in role-based systems
Courtenay : November 19th, 2006
class UserTest < Test::Unit::TestCase
def test_user_project_association_exists
user = User.find(5)
assert user.respond_to?(:projects)
end
def test_project_count_per_role
project_count = Project.count
user = NormalUser.find :first
assert user.is_a?User
# normal users should not have access to the entire set of projects
assert_not_equal project_count, user.projects.size
# admin users should be able to view all projects
user = AdminUser.find :first
assert user.is_a?User
assert_equal project_count, user.projects.size
end
end
What does this test suite tell us? First, users all inherit from User; second, user/projects have a common interface, but the result count differs per-role. How do we achieve this simply?
class User < ActiveRecord::Base
end
class NormalUser < User # STI
has_many :projects
end
user.projects.find(:all, :limit => 5, :offset => params[:page])
class Admin < User
???
end
The easy solution is to define some finder sql and extend the association's methods with the splat.
class Admin < User
has_many :projects, :finder_sql => 'select * from projects' do
def find(*args)
Project.find(*args)
end
end
end
user = User.find(3)
user.projects # success!
For homework, create a proc that you just :extend, so it can be shared around like a cheap floozy.
some easy optimizations; or, reading the rails source for fun and profit
Courtenay : November 18th, 2006
| topic | last post | number of posts |
|------------------|-------------|-----------------|
| Monkeys are cool | #12 by Joe | 25 |
| I like bananas | #33 by Fred | 8 |
|------------------|-------------|-----------------|
The data model for this schema looks something like
class Topic < ActiveRecord::Base
has_many :posts
end
The above table is going to be relatively expensive in terms of database queries, since we need to perform a number of actions
# load all the topics
# load the posts, find the last one
# count the posts
Luckily there are a number of optimizations, some built into rails.
h2. the counter cache
You're probably familiar with the counter cache, but here's a refresher.
First, add a 'posts_count' integer field to Topic; then in the post model, we set up
# post.rb
class Post < ActiveRecord::Base
belongs_to :topic, :counter_cache => true
end
Counter cache is quite interesting, in that it saves the number (count) of associated records so you don't have to look it up.
Taking a look at the source to 'belongs_to' reveals some interesting metaprogramming. I've added some comments inline.
# File vendor/rails/activerecord/lib/active_record/associations.rb, line 707
707: if options[:counter_cache]
# Sets up the count column. You can specify your own, or have it automagic.
708: cache_column = options[:counter_cache] == true ?
709: "#{self.to_s.underscore.pluralize}_count" :
710: options[:counter_cache]
711:
# Create an after_create filter on the model that automatically increments the counter on the other table
712: module_eval(
713: "after_create '#{reflection.name}.class.increment_counter(\"#{cache_column}\", #{reflection.primary_key_name})" +
714: " unless #{reflection.name}.nil?'"
715: )
716:
# Set up a before_destroy filter so that we decrement the associated counter, too.
717: module_eval(
718: "before_destroy '#{reflection.name}.class.decrement_counter(\"#{cache_column}\", #{reflection.primary_key_name})" +
719: " unless #{reflection.name}.nil?'"
720: )
721: end
Remember, calling "before_destroy
class Post < ActiveRecord::Base
belongs_to :topic
after_create :increment_topic_counter
def after_create
Topic.increment_counter('topic_count', topic_id)
end
end
For kicks, look up the source to increment_counter and you'll find it does a quick update_all, which is my favorite way of executing some fast-running SQL.
# File vendor/rails/activerecord/lib/active_record/base.rb, line 525
525: def increment_counter(counter_name, id)
526: update_all "#{counter_name} = #{counter_name} + 1", "#{primary_key} = #{quote_value(id)}"
527: end
So, what does this mean? Firstly, the after_create / before_destroy callbacks are a powerful way to implement association-based caching. This is particularly useful when you are read-optimizing (more reads, less writes). It slows down the write process (one extra query on inserts) but speeds up the reads.
Secondly, metaprogramming with module_eval is a good way to hide common functionality.
h2. extending the counter cache idea
To retrieve the last post in a topic, as in the table above, there are a number of methods.
# Eager loading.
Topic.find(:all, :include => :posts)
This will load all of the post objects as well as topics. This is fine for a small forum, but definitely won't scale. You could write some custom sql, eager loading just the last post, but that will be difficult to maintain (who wants to get their mitts on someone else's sql!?)
# Load the posts and group-by.
Post.find(:all, :group => 'topic_id', :order => 'created_at desc', :include => 'post')
This is a fun query, since it'll probably only work on mysql, (hah! eat it postgres!) -- we're flipping around the semantics of the query, performing a find on posts instead of topics, with group-by on the foreign key, and eager loading. Empty topics won't be loaded. If you want to load topics too, you can start joining. (yuk, again)
# Extend counter cache
Add a 'last_post_id' to the topic model, and ..
class Post < ActiveRecord::Base
belongs_to :topic, :counter_cache => true
def after_create
topic.update_attribute(:last_post_id, id)
end
end
class Topic < ActiveRecord::Base
has_many :posts
has_one :last_post
end
Now when you Topic.find(:all, :include => :last_post) and you'll have that last post without any extra queries.
Note: before_destroy requires some trickiness, because you'll need to find the most recent post before the current one; and you'll also need to check whether the current post is the last one attached to the topic. acts_as_list is a good candidate here.
Sounds like a good idea for a plugin which extends has_many/belongs_to!
fixture hack gets pluginized, caboose plugin repository lunches!
Courtenay : November 1st, 2006
id: 1
user: :joe
name: stuff
in your fixtures.
Give it a twirl and tell me if it works..! Remember you have to handle dependencies yourself, so fixtures :users, :products, :monkeys if they all rely on each other
script/plugin install svn://caboo.se/plugins/court3nay/awesome_fixtures
This is all viewable on the caboose svn repo Are Foreign Keys Worth Your Time?
Liquid error: undefined method `login' for nil:NilClass : May 1st, 2006
Scaffolding on Crack (sortable, filtered scaffolds)
Courtenay : January 23rd, 2006
"your site doesn't look too good in (x) browser"
Courtenay : November 12th, 2005
class Array
def smoosh
self.flatten.compact.uniq
end
end
lets you do
@somevar = [@somevar, stuff, [other, stuff]]
in lots of different views so you can dynamically build a list of stylesheets or included javascripts then smoosh it all together later for a nice flat list.
And while we're extending ruby,
class Hash
def to_sql
sql = keys.sort {|a,b| a.to_s<=>b.to_s}.inject([[]]) do |arr, key|
arr[0] << "#{key} = ?"
arr << self[key]
end
[sql[0].join(' AND ')] + sql[1..-1]
end
end
conditions = { :somekey => 'value', :someotherkey => 'othervalue }
find(:all, :conditions => conditions.to_sql)
This lets you do all kinds of dynamic sql jiggery-pokery.
"your site doesn't look too good in (x) browser"
Courtenay : November 12th, 2005
class Array
def smoosh
self.flatten.compact.uniq
end
end
lets you do
@somevar = [@somevar, stuff, [other, stuff]]
in lots of different views so you can dynamically build a list of stylesheets or included javascripts then smoosh it all together later for a nice flat list.
And while we're extending ruby,
class Hash
def to_sql
sql = keys.sort {|a,b| a.to_s<=>b.to_s}.inject([[]]) do |arr, key|
arr[0] << "#{key} = ?"
arr << self[key]
end
[sql[0].join(' AND ')] + sql[1..-1]
end
end
conditions = { :somekey => 'value', :someotherkey => 'othervalue }
find(:all, :conditions => conditions.to_sql)
This lets you do all kinds of dynamic sql jiggery-pokery.