A plam for splam
Courtenay : November 24th, 2008
A few weeks ago I wrote a fun plugin to fight spammers everywhere; I call him, Splam. I thought I wrote this up somewhere, but I can’t seem to find the article. So, I must have dreamed it. I soft-launched it on Github, so those of you following my github profile will have seen some commits.
Splam is a “Simple, pluggable, easily customizable score-based spam filter plugin for Ruby-based applications”. I couldn’t find any other Ruby projects outside of Defensio and Akismet, both hosted services, so while you might say, “but those work perfectly well!”, you can run this locally and get instant feedback. Install it as a plugin, and include it into your ActiveRecord (or other PORO) like so:
class Comment < ActiveRecord::Base
include Splam
splammable :body
end
Easy, right? Splam works by looking at the field, and applying a set of rules. Some of these rules are pretty simple; most forum/comment spam is pretty simple, too. For example, it looks for words like "porn" or "erotic" or "viagra", and gives 10 points for each of these. Then it looks in links in the body, and gives another 20 points each time a word appears in the link text. (Actually, I modified the code so it gives 10^ rather than 10*. That means that each time you use a banned word, it's exponentially more likely it's spam).
Some of the other rules target the idiots who try to spam your Ruby forum with bbcode: [url= or [b] Then it gives you points for each chinese character, and more points for Russian glyphs. It looks for bad HTML (a href=http://...) as well as extra long lines, sentences with lots of words and no punctuation, too many words with lots of letters, and so on.
Splam gives you a spam? boolean, a splammable_score score, and the splam_reasons why it marked something as spam, along with the points for each infraction.
I originally wrote Splam for the new app, Tender Support we've been building at entp -- I've been tweaking it until it gets zero false positives against Defensio, Akismet AND our sturdy human spam checker, Will the Defender, against the complete set of support/help requests in the Lighthouse project. Interestingly enough, I had to add a set of "good words" which takes spam points away (things related to our business).
In this way, you can see splam as this horrible manual system with no training ability outside the code. It's an arms race, but I think we're not up against a particularly clever enemy[1]. I'd really like to add some clever bayesian magic to it, but since it works well enough for me right now, I'm gonna throw it down to you guys. I'd also like to make the points themselves a percentage rating (adding % chance that it's spam) rather than an absolute (>100 points, and it’s spam).
Splam has a test suite, so you can check it out, put some of your corpus in individual text files in test/fixtures/comment, and send me a diff or pull request with anything that gets incorrectly marked as spam or ham.
Splam is at http://github.com/courtenay/splam/tree/master.
[1] This isn’t really meant to be a challenge to spammers; interestingly enough, most of the spam we get is just mass-blasted crap that isn’t really targeted at all. We were talking in the company campfire about how you could really make a bunch of money out of “spam”; if you’re intent on selling shady drugs, peddling nootropics to programmers would work better than sex enhancers, HGH to competitive cyclists, and so on. You could probably build some clever markov chains to interact on forums, leading people back to your own site where you start the pitch.
8 Responses to “A plam for splam”
Sorry, comments are closed for this article.
November 24th, 2008 at 02:57 PM
Already been done by rsl:
http://github.com/rsl/acts_as_snook/tree/master
Maybe get ideas from it?
November 24th, 2008 at 09:57 PM
No, I saw this one.
I didn't like the fact that all the ruby is in one 288 line file. My plugin is modular (separate classes for each rule), and has tests which actually run against a real corpus of ham and spam.
November 25th, 2008 at 04:09 AM
I think you should return false to spam if body is nil. At the moment an error is raised. Sorry, couldn't be bothered to go though all that Git malarkey to get the patch to you.
November 25th, 2008 at 04:18 AM
You've also left a space in the Punctuation rule regex, it's not picking up full stops correctly.
November 25th, 2008 at 04:23 AM
When I first read "I couldn’t find any other Ruby projects outside of Defensio and Akismet" I thought I'd point out my acts_as_snook plugin but apparently you've seen and dismissed it as not having enough complexity and modularity and heft. In defense of the plugin [which is clearly slighted here] it's the simplest thing that could possibly work [last time I checked a good thing to Rubyists] and in practice it catches 99% of anything that's come its way that I've seen.
That being said I do like the modularity of your design and have been thinking of implementing something a little different for customizing my own code. It's good to see more solutions to this problem!
November 25th, 2008 at 10:39 AM
ack! Sorry, RSL, I was up late and didn't mean to trash your project like that. I'd forgotten that I had seen your plugin, and I get a bit pissed at people saying, "huh, already done". I specifically wanted something where I could just delete certain files (like russian.rb or chinese.rb) depending on the project.
Anyway, I personally don't like the "everything in one big file", but to each his own.
I like how your plugin makes use of the author's email, too. Might steal that one.
November 26th, 2008 at 08:03 AM
The sheer irony of a post talking about anti-spam plugins when the http://beast.caboo.se forum is INUNDATED with the stuff to the point that it is unusable is striking.
December 14th, 2008 at 08:25 PM
Great work! I guess it may make sense to make the project a web service available for all the ruby community!
Thank Dan