Spam Problem? What Spam Problem?

This site’s addition to Google recently (I presume) has brought in everyone’s favorite type of spam: MT comment spam. Measures have now been taken that should take care of the problem, at least for the time being.

The first spam comments seem to have appeared sometime this month; going back through the comment archives, everything posted last year was legitimate. Originally it was posts from way back that had received spam comments, not recent ones. Since I usually check for comments by visiting the front page instead of the admin backend, I didn’t notice the spams either.

Until some asshat used a script to put a spam comment on every single post in the archive. That sort of thing tends to stand out, you know?

All the spam comments have now been expurgated from the site. Since I’d get timeouts for the server upon rebuilding all the files after taking care of the spam comments, I think everything should be in order, but there may be cases way back in the archives where the listed number of comments doesn’t match the actual number. No big deal.

To hopefully keep a lid on the spam problem in the future, I’m trying out a Bayesian spam filter for MovableType. At the very least, it provides a comment deletion feature that’s slightly less painful than the one built into MT. (Which, granted, really isn’t saying much.) Time will tell how well it works in practice for a low-traffic site like this.

9 Responses

  1. As an addendum (and a test to see how the filter handles new posts), I’ll add here that it’s my policy to *only* delete spam that makes it onto the site. I have no interest wielding the delete key as a weapon of editorial control or anything like that.

  2. Bayesian is probably overkill. Did you see the quick and dirty MT hacks I posted about before I got MT and phpBB working together?

  3. Man that is the most annoying thing I have ever seen. And the style of it makes me wonder what purpose do they have in doing this. They make no money from this since nobody in their right money would buy presciption drugs because an ad in a blog told them to. If I ever meet a spammer that does this to make money as oppose to being paid to I would maim them.

  4. fluffy: I remember you having written about that sort of thing a while ago. I’ll remember to take a look at it again. And yeah, Bayesian is almost certainly overkill, but what’s the alternative, maintaining some sort of blacklist? Besides, at least the Bayesian plugin provides an interface for deleting more than one post at a time. I regret having installed it *after* taking care of Mr. Post-a-spam-in-every-single-article. I know I don’t need to tell you how many clicks that takes.

    Eric: Blog spam’s goal isn’t to adverise on a blog per se, but to do Google-bombing. Put a spam comment on each posting on hundreds of blogs, wait until Google crawls them again, and suddenly your page rank score shoots up. Far easier than setting up link farms, eh? A lot of the comment spams I cleaned out were masquerading as bland, mundane replies (e.g., “wow, great post!”), but the URL was something like your-discount-source-for-focusin.com.

  5. I read http://trikuare.cx/mt/archives/000410.php again, but the problem with that solution is that I’d need to change things around to use PHP instead of HTML pages and muck with the source here and there. Call me lazy, but that’s more work than I really feel like sinking into maintaining this site. I mean, I’ve never even bothered to change the layout from the MT default.

    Besides, if I wanted to move over to running things out of PHP, I’d rather just move all the way to some solution that doesn’t insist on generating “static” files for everything and instead generates everything on the fly. This setup isn’t particularly secure when you’re hosting things off a server a few tens of thousands of people have access to and all CGIs run as the apache user. Sadly, I remember all the alternatives I had looked at had the same problem, or looked to be unmaintained, or both.

  6. Just having a fixed hidden key (like, gabba=”asdfasfd<$MTEntryID$>”) would probably work pretty well for at least 99% of the comment spam though. As long as it’s a different key per page and the spammers don’t actually parse the HTML for each entry (which they don’t, they just randomly send POST requests once they find an mt-comments.cgi instance).

  7. If I start seeing a lot of spam comments turn up again, I’ll look into it. So far, I haven’t seen any since I posted this entry. Whether that’s the filter doing its job (probably not) or just that no one’s attacked in the past few days (most likely) is impossible to tell.

  8. Does the filter keep logs?

    I seriously doubt that a Bayesian filter will work for such small chunks of text.

  9. Doesn’t look like it.

Comments are closed.