Looking for a JRuby expert

Courtenay : September 14th, 2009

We’ve hit a snag deploying a JRuby application on the OC4J app server. We’re having all sorts of issues getting our application running (it works fine, but there are strange errors are deep within the stack).

I need a consultant to come onsite in San Francisco for a day or two to fix our deployment issues and help us fix or re-set up the production servers. Please contact me (courtenay @ entp.com) if you can help some time in the next 1-2 weeks.

Installing munin on EngineYard

Courtenay : August 24th, 2009

Munin is pretty much the standard system monitoring software for linux machines. If you have a server on EngineYard, they don't officially support munin, so it's up to you to get it installed and working. If you use a rails monitoring plugin like NewRelic, well, it's just like that, only older, open source, and uglier. We use and rely on New Relic on Tender and Lighthouse, and it's really fucking great. But for servers with no rails apps, and to get some nice aggregated graphs, I (as chief sysadmin) like using Munin.

There are two parts to munin - the server, and the client. It's sort of confusing; there's a daemon that runs on each of your machines and accepts connections from remote (munin nodes), and there's a cron job that runs on one specific machine which periodically fetches stats from each machine and generates the graphs. I'm going to assume you have munin itself set up somewhere.

First step, you need to get some ports forwarded, because your slice isn't directly accessible to the outside world. A helpful EY tech will forward your port 4949 (munin) from each slice, to something accessible on a public IP, like 22491, 22492, etc.

Now, since we're running gentoo, you'll need to install munin using the emerge package manager. On some OSes (like fedora) you install munin-node but on Gentoo it seems you get both munin and munin-node in one package.

# emerge munin

After about 15 minutes of compiling, we're good to go. You'll want to edit the /etc/munin/munin-node.conf file and add the hostname of your main munin machine as a regex. This will (hopefully) prevent bad guys from seeing your load charts.

allow ^74\.123\.111\.100$

Munin-node works by looking in a directory full of symlinks to executables. It runs the executables and records the results periodically. So, next (still as root) we want munin-node to discover what it should be running.

# munin-node-configure --shell

You'll get a dump of about 30 results that looks like

ln -s /usr/libexec/munin/plugins/cpu /etc/munin/plugins/cpu
ln -s /usr/libexec/munin/plugins/df /etc/munin/plugins/df

You can copy/paste those to create the necessary symlinks. There shouldn't be too much configuration required.

Start up munin-node

# /etc/init.d/munin-node start

And add munin-node to the scripts to be run on startup by symlinking it to the relevant place.

# cd /etc/runlevels/default
# ln -s /etc/init.d/munin-node ./

And you're done. Now, add the new nodes to the master config file and wait until the next cron runs to update them.

Rails 2.3.2: not ready for prime time

Courtenay : May 9th, 2009

I’d strongly advise against upgrading your application to Rails 2.3.2. I’ve hit quite a few bugs that have taken a big chunk out of several of my days, including:

  • “can’t dup nilclass”, as reported here: http://groups.google.com/group/rubyonrails-core/msg/787b561d166abf53
  • A strange bug in associations and rspec, http://groups.google.com/group/rspec/browse_thread/thread/7901db1b123eb93c
  • Lots of upgrade issues with rspec requiring a rewrite of routing specs and erroneous missing templates (reported in their README)
  • Anything that isn’t fixtures, but creates records in tests, will fail due to the Rails new way of doing nested transactions

Stick to 2.2 for a while longer – it’s not worth your time.

In our open source ticket tracker xtt, we have code that looks like this:

has_many :memberships do
  def contexts
    proxy_owner.memberships.sort.group_by &:context
  end
end

Basically, it lets people group their memberships to various projects by a context. So each user can have their own grouping for projects. If you don't have a context for a project, it shows that last, in an implicit 'etc' context. In the tests, we check for this behavior like

it "leaves nil context for last" do
  @contexts = @user.memberships.contexts
  @contexts.last.should == [nil, [memberships(:default)]]
end

However, since rails now returns aggregated association results (such as group_by or total with :group) as an ActiveSupport::OrderedHash, you don't get all those tasty array methods that Rails is so famous for (#last, #second etc).

ActiveSupport::OrderedHash is just some code that duplicates Ruby 1.9's hash functionality.

The easy solution here is to make your results into an array at some point, rewrite your tests, or send a patch to the rails team to mix in the Array monkeypatched code into OrderedHash.

it "leaves nil context for last" do
  @contexts = @user.memberships.contexts
  @contexts.to_a.last.should == [nil, [memberships(:default)]]
end

Vegas Scam

admin : April 1st, 2009

Today we were trying to sort out running CabooseConf during Rails Conf in Vegas. As you may recall, last year we provided free lunch and beverages for you all to drink while you hack. This year, we're doing the same thing under the auspices of O'Reilly.

My favorite energy drink of choice right now is Function Alternative Energy. In fact, I love it so much that we go through a few cases every month. It has caffeine, Yerba Mate and guarana in a great mix that doesn't give you a crazy buzz -- just laser-like focus. This is probably a topic for another day, but the important point is that we got a sponsorship from Function (free drinks!) so that we could bring a couple hundred of these and hand them out to our programmer friends and get them hooked.

Now, as it turns out, you can't just do that. Not in Vegas. You have to pay everyone in line.

It costs me $1 per drink consumed, as corkage. Then, I have to pay the Chef's union another $0.60. Per drink. As the guy at the Hilton said, at least it's better than us providing you with water. That costs $3.75 per bottle.

Now, it's not about the money. I know that there are people ready to either sponsor our little anticonf or I can throw some of my own cash at it. That's totally not the point. I have a moral objection to giving these assholes any money at all.

Oh, and a bowl of pretzels? $30. For that, I hope they let you keep the fucking bowl.

How many of you are planning to come to Railsconf and just hang out at the free caboose conf?

CabooseConf '09

Courtenay : March 12th, 2009

Hey everyone! CabooseConf 09 is on again. For those of you who don’t know, we have been running a free anti-conference at the same time as the O’Reilly Railsconf behemoth. Last year, Chad and Gina approached me after the conference and asked if we would run it again this year, but under the banner of Railsconf itself. Keep your friends close, I guess ;)

I’m a big fan of going to conferences and attending the “hallway track”. So, this year it’s officially sanctioned.. Come to Vegas, network, hack, code, and don’t pay a cent. We have conference rooms staked out at the Railsconf hotel, and we’re doing our best to organize similar perks to last year (free energy drinks, etc.)

You do have to sign up at the O’Reilly “railsconf”:http://en.oreilly.com/rails2009/ site and get a badge. Did I mention it’s free? One thing you need to do – as usual – go write some open source code! No, you can’t use something you did two months ago. No, you can’t use some code on some PHP project. This is railsconf, so go show it like you mean it, and contribute something to rails. We won’t let you in unless you do. Seriously. (*Merb also accepted)

We hope to be hosting all the hackfests.. so bring your open source ruby codes.

Allowing custom CSS in your app

Courtenay : December 31st, 2008

There are a number of good reasons why you don't want your users providing their own CSS (for example, when theming their site). These are: taste (see: myspace) and security.

The former is pretty much your users' problem. The pages don't have to look terrible -- and in fact Myspace charges a LOT of money to do those custom movie or band pages (it's part of the service when you buy their primo ad space).

The latter, well, as it turns out there are a bunch of security vulnerabilities exposed in CSS. While these are mainly in IE, related to expressions (you can run javascript from your CSS). This means that users can steal others' sessions. So, while there are some excellent perl libraries out there for this, there hasn't been one for ruby -- until now! (at least that I could find).

So, here's my first attempt.

css_file_sanitize (github)

I stole most of the tests from LiveJournal's css sanitizing library, and rewrote the implementation in Ruby. I'd love to hear your collective feedback. It's a really lazy plugin; in fact, while it does have tests, you're best to just include the module in your model. This is a case of "it works on my machine" so send your patches!

Works on my machine

A plam for splam

Courtenay : November 24th, 2008

A few weeks ago I wrote a fun plugin to fight spammers everywhere; I call him, Splam. I thought I wrote this up somewhere, but I can’t seem to find the article. So, I must have dreamed it. I soft-launched it on Github, so those of you following my github profile will have seen some commits.

Splam is a “Simple, pluggable, easily customizable score-based spam filter plugin for Ruby-based applications”. I couldn’t find any other Ruby projects outside of Defensio and Akismet, both hosted services, so while you might say, “but those work perfectly well!”, you can run this locally and get instant feedback. Install it as a plugin, and include it into your ActiveRecord (or other PORO) like so:

class Comment < ActiveRecord::Base
  include Splam  
  splammable :body
end

Easy, right? Splam works by looking at the field, and applying a set of rules. Some of these rules are pretty simple; most forum/comment spam is pretty simple, too. For example, it looks for words like "porn" or "erotic" or "viagra", and gives 10 points for each of these. Then it looks in links in the body, and gives another 20 points each time a word appears in the link text. (Actually, I modified the code so it gives 10^ rather than 10*. That means that each time you use a banned word, it's exponentially more likely it's spam).

Some of the other rules target the idiots who try to spam your Ruby forum with bbcode: [url= or [b] Then it gives you points for each chinese character, and more points for Russian glyphs. It looks for bad HTML (a href=http://...) as well as extra long lines, sentences with lots of words and no punctuation, too many words with lots of letters, and so on.

Splam gives you a spam? boolean, a splammable_score score, and the splam_reasons why it marked something as spam, along with the points for each infraction.

I originally wrote Splam for the new app, Tender Support we've been building at entp -- I've been tweaking it until it gets zero false positives against Defensio, Akismet AND our sturdy human spam checker, Will the Defender, against the complete set of support/help requests in the Lighthouse project. Interestingly enough, I had to add a set of "good words" which takes spam points away (things related to our business).

In this way, you can see splam as this horrible manual system with no training ability outside the code. It's an arms race, but I think we're not up against a particularly clever enemy[1]. I'd really like to add some clever bayesian magic to it, but since it works well enough for me right now, I'm gonna throw it down to you guys. I'd also like to make the points themselves a percentage rating (adding % chance that it's spam) rather than an absolute (>100 points, and it’s spam).

Splam has a test suite, so you can check it out, put some of your corpus in individual text files in test/fixtures/comment, and send me a diff or pull request with anything that gets incorrectly marked as spam or ham.

Splam is at http://github.com/courtenay/splam/tree/master.

[1] This isn’t really meant to be a challenge to spammers; interestingly enough, most of the spam we get is just mass-blasted crap that isn’t really targeted at all. We were talking in the company campfire about how you could really make a bunch of money out of “spam”; if you’re intent on selling shady drugs, peddling nootropics to programmers would work better than sex enhancers, HGH to competitive cyclists, and so on. You could probably build some clever markov chains to interact on forums, leading people back to your own site where you start the pitch.

new plugin: acts_as_git

Courtenay : November 14th, 2008

With the help of Jamie van Dyke at Parfait and Scott Chacon at GitHub, I'm pleased to announce Acts As Git (no, I don't like the name either). It's a simple plugin which stores all changes you make to a text field in a git repository. This is ideal for something like a git-backed wiki.

Look at it here: github or check it out from

git://github.com/courtenay/acts_like_git.git

From the README:

ALG automagically saves the history of a given text or string field. It sits over the top of an ActiveRecord model; after a value is committed to the database, the plugin writes the new value to a text file and commits it to a git repository. This way you get all the advantages of using Git as version-control.

Usage:

class Post < ActiveRecord::Base
  versioning(:title) do |version|
    version.repository = '/home/git/repositories/postal.git'
    version.message = lambda { |post| "Committed by #{post.author.name}" }
  end
end

To view the complete list of changes:

>> @post = Post.find 15
<Post:15>
>> @post.title
=> 'Freddy'
>> @post.history(:title)
=> ['Joe', 'Frank', 'Freddy]
>> @post.log
=> ['bfec2f69e270d2d02de4e8c7a4eb2bd0f132bdbb', '643deb45c12982dde75ba71657792a2dbdda83e6', 
'1ce6c7368219db7698f4acc3417e656510b4138d']
>> @post.revert_to '1ce6c7368219db7698f4acc3417e656510b4138d'
>> @post.title
=> 'Joe'

It uses the excellent Grit library, and doesn't actually have a checked-out repository. The latest version of your data is still stored in the database. You can actually clone this repo and view the changes; pushing back to it won't do anything useful.

Plugin configuration style?

Courtenay : November 10th, 2008

I’m putting the final touches on a super-sweet versioning plugin, and I’ve discovered that we’re using several different metaphors for configuring the plugin options. I’d like to get some opinions/feedback on your preferred style.

The DSL

Using a DSL and passing blocks in which get instance evalled. I’m normally very scathing of DSLs; I think that they’re Yet Another Language for people to learn to use – it’s usually your very own write-only syntax – but it’s been super-fun implementing the backend to this.

class Monkey < ActiveRecord::Base
  versioning do
    author do
      name { user.current.name }
      message { "Commited via #{name}" }
    end
    repository "Joe's DataStore" 
  end

Hashes

This seems to be the Rails plugin default:

class Monkey < ActiveRecord::Base
  versioning :author => { :name => lambda{ |u| user.current.name } }, :repository => "Joe's DataStore" 
end

Class vars / methods

Easy to monkeypatch later

class Monkey < ActiveRecord::Base
   will_version
   @@version_repository = "Joe's DataStory" 
   def version_author
     current_name
   end
end

Are there others? Which do you prefer? Currently I’m using all three in this one plugin, and it’s very un-awesome.

Ripping out your mocks

Courtenay : November 6th, 2008

I sat down with David Chelimsky at Rubyconf today to talk about rSpec and an interesting topic came up.

In my mind, there are two reasons to use a mock object: first, when you’re developing TDD style, you physically don’t have the objects yet; and second, so that you can tightly focus your unit tests. Maybe, these two different purposes should use a different mechanism.

His question to me then was, “Do you replace your mocks with the real objects after you’ve implemented those objects?”. I guess I hadn’t thought about that before. Do you? If so, how do you handle the extra complexity, maintaining sane associations and valid data?

On hiring Rubyists and Railsers

Courtenay : November 4th, 2008

We’re launching a new service at work in the next week or so that involves me looking through a lot of job applications: resumes and sample code.

I’d like to tell people right now, upfront, if you’re applying for a Ruby or Rails job, for anyone, there are a few ways of ensuring you get called back. They’re probably fairly simple.

Send some sample code, maybe a link to a project on Github, or a snippet of work you’ve done. Make sure you send the tests for the code. Any tests would be good, and you get bonus points for good tests. If you don’t have any tests, write them.

Don’t worry too much about sending some crazy complex code. Maybe some polymorphic associations (models), some ajax (views), a knowledge of the whole stack (simple controllers), some nested resources. Write a simple todo list application.

It’s not just a silly philosophy. Writing tests – hell, submitting tests with your job application’s code – shows that you’ve actually thought about the code, and that it actually works. You’ve permutated and permeated through the logic, actually think about the various ramifications of the design decisions in the code itself.

Just the pure act of sending tests with your sample code will put you above 90% of applicants, I promise.

We've stopped using rSpec ...

Courtenay : November 3rd, 2008

...for new projects.

fail

We upgraded the gems for one of our client projects, and the auto-loading / config.gems managed to completely break all our other projects, requiring upgrades, which caused weird breakages in weird places in some of the specs.

The app would refuse to deploy (rake tmp:create failed, because lib/tasks/rspec.rake was being loaded, and spec wasn't installed on the server). The annoying thing was that just having whatever.11 installed (I don't know the exact version) broke older apps on whatever.4 or whatever.0.2. .. so those had to be upgraded too. We wasted a day or two (three, maybe four developers) which equates to several thousand dollars in wasteage. It was also really infuriating -- the culmination of a few years of frustration of rSpec's weirdnesses.

After that, I found that some of the specs had never run (who knows why). It stopped reading spec.opts and started doing some weirdness with pending options. Finally, Rick just snapped, threw out rSpec and his Model Stubbing library, and now we're playing with a combination of rr, context, and matchy, trying to get a feel for a decent workflow again. It's sad and maybe a bit exciting to be on the edge.

What are you testing with?

A simple Rails slow-query logger

Courtenay : September 29th, 2008

A few years ago I wrote a simple addition to ActiveRecord that does two things: it chops out the eager loading "t1_t2 AS foo", and it shows the number of records returned for every query you run against the database. You can view the file here

Today I was profiling a site and wanted to quickly find the slow database queries, but didn't have access to mysql's config directly, so I patched that file above to record all queries over 500ms and save it to a log file. I'll warn you now, it ain't pretty, but it works pretty well.

Here's how it works: First, throw this in a file in config/initializers. I open up the rails abstract adapter

module ActiveRecord
  module ConnectionAdapters
    class AbstractAdapter

And add in a new logger.

        def slow_query; 0.5; end # number of seconds
        def slow_query_logger
          @slow_query_logger ||= Logger.new("log/slow_queries.log")
        end

Ideally of course this would all be configurable.

Next, I copy the logging code out of the latest ActiveRecord, and patch it to return the number of records. This is a bit of a hack, too, but we can either look at "num_rows" from the resultset or the actual size of an array.

              s = result && (result.respond_to?(:num_rows) ? result.num_rows : \
                 (result.respond_to?(:size) ? result.size : 0)) || 0

Finally, I rewrite the actual log method so that it checks the benchmark against our threshold

        def log_info(sql, name, runtime, result_size = 0)
          if runtime > slow_query && slow_query_logger
            slow_query_logger.debug "Slow query: (#{runtime}) [#{result_size}] #{sql}"
          end

And add the number of results to the regular rails log, while snipping out the annoying eager-loading code.

          if @logger && @logger.debug?
            if name =~ /Load Including Associations$/
              sql = sql.scan(/SELECT /).to_s + ' ...<snip>... ' + sql.scan(/(FROM .*)$/).to_s
            end

            name = "#{name.nil? ? "SQL" : name} (#{sprintf("%f", runtime)}) [#{result_size.to_i}]"
            @logger.debug format_log_entry(name, sql.squeeze(' '))
          end
        end

Here's the full file.

module ActiveRecord
  module ConnectionAdapters # :nodoc:
    class AbstractAdapter
      protected
        # todo: config this
        def slow_query; 0.5; end
        def slow_query_logger
          @slow_query_logger ||= Logger.new("log/slow_queries.log")
        end

        alias_method :old_log, :log

        def log(sql, name, &block)
          if block_given?
            #if @logger and @logger.level <= Logger::INFO
              result = nil
              seconds = Benchmark.realtime { result = yield }
              @runtime += seconds
              s = result && (result.respond_to?(:num_rows) ? result.num_rows : \
                 (result.respond_to?(:size) ? result.size : 0)) || 0 
              log_info(sql, name, seconds, s)
              return result
            #end
          else
            log_info(sql, name, 0, 0)
            nil
          end
          # old_log(sql, name) { yield }
        rescue Exception => e
          @last_verification = 0
          message = "#{e.class.name}: #{e.message}: #{sql}"
          log_info(message, name, 0)
          raise ActiveRecord::StatementInvalid, message
        end

        alias_method :old_log_info, :log_info
        def log_info(sql, name, runtime, result_size = 0)
          if runtime > slow_query && slow_query_logger
            slow_query_logger.debug "Slow query: (#{runtime}) [#{result_size}] #{sql}"
          end
          if @logger && @logger.debug?
            if name =~ /Load Including Associations$/
              sql = sql.scan(/SELECT /).to_s + ' ...<snip>... ' + sql.scan(/(FROM .*)$/).to_s
            end

            name = "#{name.nil? ? "SQL" : name} (#{sprintf("%f", runtime)}) [#{result_size.to_i}]"
            @logger.debug format_log_entry(name, sql.squeeze(' '))
          end
        end
      end
    end
  end

Would this work as a plugin? As a patch to Rails itself? Or did somebody else already implement a cross-platform slow query logger?

The awesomest filter and sort ever

Courtenay : August 26th, 2008

Update 2: seems like only one or two people knew about what can_search does :) I hope we’re all a little better educated.

Update: yes, I’m using these named scopes throughout the app in other places – they aren’t used only in this one controller.

Often you have an index action where you want to sort records, filter by a parameter, and maybe join on some other tables to get a result. Let’s say you’re looking at a videos controller (where videos are acts_as_taggable) and you want to filter by user_id, filter by tag name, order by video title, or rating. Maybe later, you’ll add a roles (hm:t) association and need to only show videos viewable by a certain user. How complex!

To solve this, we’re going to play with some things you may know, and finish up with a bam! pow! that’ll take your breath away.

Rather than build up some form of frankenquery with all sorts of conditionals and cases, joins, and other messing about, let’s use a brand-new bleeding edge feature of Rails: named scopes.

First, build up individual named scopes for each axis on which you wish to filter. Make sure and put the table name in that query.



    named_scope :by_user, lambda { |user_id| 
      { :conditions => ['videos.user_id = ?', user_id] }
    }

    named_scope :tag_name, lambda { |tag_name|
      { :joins => { :taggable => :tag },
      { :conditions => ['tags.name = ?', tag] }
    }

    named_scope :rating, lambda { |rating| 
      { :conditions => ['ratings_count > ?', rating] }
    }

OK, I cheated on the last one, but let’s assume you have a counter_cache on ratings count.

Now, if you have more than one scope with joins in it, you’ll need to apply this patch to your rails installation, or upgrade past 2.1.1. This will allow you to have as many joins as you like in your scopes.

Now, here’s where the magic happens: in the controller. Big shout out to protocool for this method. Let’s build up a set of all the possible scopes that we might want to use, in an array form like [ named_scope, argument ]

def index
  scopes = []
  scopes << [ :by_user, params[:user_id] ] if params[:user_id]
  scopes << [ :tag_name, params[:tag_name] ] if params[:tag_name]
  scopes << [ :rating, params[:rating] ] if params[:rating]
end

Easy, right? Very readable.

How about some ordering?

  order = { 'name' : 'videos.name ASC' }[params[:order]] || 'videos.id DESC'

Now, as you know, you can chain named scopes. So you could say Video.by_user(2).tag_name('monkeys') Let's take advantage of this, building up a chain of scopes dynamically using 'inject', starting from Video, and adding each scope we added to the array above. This is really fun magic, because it doesn't run any of the queries until the whole thing is built. I don't even know how this works, but it does. Swimmingly.

  @videos = scopes.inject(Video) {|m,v| m.scopes[v[0]].call(m, v[1]) }.paginate(:all, :order => order)

The final method looks like this:

def index
  scopes = []
  scopes << [ :by_user, params[:user_id] ] if params[:user_id]
  scopes << [ :tag_name, params[:tag_name] ] if params[:tag_name]
  scopes << [ :rating, params[:rating] ] if params[:rating]

  order = { 'name' : 'videos.name ASC' }[params[:order]] || 'videos.id DESC'

  @videos = scopes.inject(Video) {|m,v| m.scopes[v[0]].call(m, v[1]) }.paginate(:all, :order => order, :page => params[:page])
end

One final caveat. Sometimes :joins doesn’t know where to get the video id from, so if you’re using id in your app, you’ll need a slight workaround involving manually getting the pagination count, and forcing :select => &#8216;distinct videos.*&#8217; in the paginate call.

If this works for you, it’s really easy to add new filtering, ordering, or even scoping to your query. For example, you can add some form of role hackery to your video


    named_scope :viewable_by, lambda { |user| 
      { :joins => { :permissions => :roles },
        :conditions => [ "roles.user_id = ? AND permissions.role = ?", user.id, "view"
    }

Controller, you replace the first scope definition with this

scopes = [ :viewable_by, current_user ]

Or, you modify the scope inject statement


    @videos = scopes.inject(Video.viewable_by(current_user)) { |m,v| ... }

If you consider this a giant hack, you’re probably at least partly right. However, the alternative in building up a complex query with many possible moving parts is just hideous. And consider this: you can unit test each part of the query on its own, in the model specs.