Protecting Forms from Spam ‘Bots

Posted April 30th, 2008 by Mike Cherim

Spam robots or spam ‘bots abuse comment forms, contact forms, and any forms they can. A true scourge of the web. They exploit insecure forms to send spam. Fortunately there are individual methods of slowing them down. And when used jointly, ‘bots can actually be stopped. This post shares four scripted operations I’ve found particularly helpful to prevent illegitimate posting. In the following examples, at least one variable will be posted, and will appear in this article as $posted_var (meaning it’s captured by $_POST['var']) to offer greater clarity, but this variable can be whatever you need it to be: name, email, etc.

Also in this article, in the script blocks, I mention to Return an error or give value to a feedback/output variable. What you do depends on the overall build of your script. This can mean anything from using the echo() or die() functions to output a user error, or you can simply assign a value to a variable used to call an error from a database or flat file.

Here are the methods.

The Anti-Spam Question

This is one that lots of people are familiar with. In fact, thanks to my v2, v2WP, and v3 contact form scripts, the question “Is fire hot or cold?” is something found in the furthest corners of the web (though people should change it). The premise here is that spam ‘bots are essentially stupid, lacking all cognitive powers, so we can take advantage of their lack of humanity by asking a question only a human can answer. The question needs to appear in the label element associated with the answer input. It can and should be extremely simple. The answer to the question is the $posted_var and it must match the preassigned $answer. It’s made case-insensitive by the script.

Here’s the scripting:

<?php
    // To mitigate the possibility of user errors, eliminate case-matching issues
    $posted_var = strtolower($posted_var);
    $answer     = strtolower($answer);
    // If $posted_var doesn’t match $answer
if($posted_var != "$answer") {
    // Return an error or give value to a feedback/output variable
}
?>

Notes: This error can be human-triggered so it’ll be best to present helpful information in your error message. Don’t be rude, instead offer an alternative or even the proper answer. This must be a required field. Although there are other versions of this method, randomized questions, image-based (and voice recorded) methods, etc., this is the simplest, independent (not requiring a session) technique, and it is effective.

Menu Option Matching

If your form offers a pulldown menu (a select with option elements), making it a required field and applying option matching is really quite effective. The premise is that ‘bots don’t use a form the way human users will. They won’t chose one of the options offered. In most cases ‘bots will assign any value to the menu’s $posted_var that they can, and if it’s not spam, they’ll often resort to inputting the number ‘1’ just to satisfy the requirement to post something. That’s where this measure shines: with this script, the $posted_var must be a legitimate offering — assuming that is that you feed your options by way of an array(), not hardcoded.

Here’s the scripting:

<?php
    // If the $posted_var isn’t present as one of the options in the array
if(!in_array($posted_var, $menu_options)) {
    // Return an error or give value to a feedback/output variable
}
?>

Notes: Only ‘bots can trigger this error so you should be able to openly speak your mind in your error message, if you feel like being colorful in your speech. This must be a required field.

The Honeypot Spam Trap

When I devised these traps, they no doubt existed before I was clever enough to think of them on my own. This one is a classic example because months after I discovered its value, I found the technique by another name from Ned Batchelder in his article called Stopping spambots with hashes and honeypots. I immediately adopted the name. The premise is rather simple. You offer a hidden field (hidden by the offset class, positioning it with a large negative left margin) that should remain empty. Since ‘bots will fill everything and anything to circumvent any required field hiccup, they’ll fill this tempting input, too. Problem is, for them, the script doesn’t want it filled in and will return an error if it is. It’s important to actually warn of this in the label.

Here’s the scripting:

<?php
    // If the $honeypot (named so for clarity in this example) isn’t empty
if($honeypot != "") { // Could also be if(!empty($honeypot)) {
    // Return an error or give value to a feedback/output variable
}
?>

Notes: This error can be human-triggered so present helpful information in your error message. Don’t be rude, instead offer a method of clearing the input (most people will not be able to see it). Also, be sure to not identify this input with a typical name attribute because some autofill software will populate the field without the human user’s knowledge. Also, if using the offset class, send it left only so as to not torment keyboard users by sending them inexplicably back to the top. This must not be a required field.

Regulating Input Lengths

Many forms will limit the number of characters the user is able to provide in any given input. For example, you should be able to safely place a limit of about sixty characters inputted into a “name” field. This number is defined and physically limited by the use of the maxlength attribute in the input. But ‘bots, not using the form in the same way as humans, will think nothing of trying to enter 200 spam links in the name field. After all, their only objective is to send spam and don’t respect the input elements unless forced to. Because of this, it is essential that you support the physical limitations of the input elements by script-enforcing the regulation. The premise is simple, if ‘bots input more characters than what’s allowed for human users, the script won’t process and the form will spit out an error.

Here’s the scripting:

<?php
    // If the $posted_var is greater than sixty characters…
if(strlen($posted_var) > 60) {
    // Return an error or give value to a feedback/output variable
}
?>

Notes: Only ‘bots should be able to trigger this error so you can let ‘er rip, so to speak, in a colorfully-word error message, if that happens to bring you joy.

Are There Other Methods?

Yes, there are other worthy methods or variations of these methods — and there are also other security concerns, not just ‘bot posting. But these methods are my favorites. Solid, simple, and accessible. Some other methods limit accessibility. Others I have found problematic at times. Some, for example, will rely on server environmental or user variables, and that can lead to inconsistencies due to data deficiencies. I try to make my form scripts work under all conditions, but if building a contact form for your own site, you can get more creative in some areas.

That said, I have found the four methods I offered in this article, when used in concert, provide exceptional, unwavering ‘bot posting protection. So far so good, anyway.

Sufficient for Form Security?

In a nutshell, no! The focus of this article is specifically about keeping ‘bots from posting. There are other concerns such as any illegal remote posting, header injection exploits, open relay attacks, and other somewhat more human-associated threats one needs to be concerned with. In other words, employing the four techniques in this article is not enough to fully secure a form.

The more popular and well-known the site becomes in time, the greater the number and frequency of malicious attempts, from both humans and ‘bots. This is one of the more unfortunate aspects of the Internet. It’d be nice if we could leave the house without locking our doors, but it’s not that kind of neighborhood. We have to protect ourselves.

Filed under All Things PHP, Security Matters —

20 Responses to: “Protecting Forms from Spam ‘Bots”

James Dimick responds:
Posted: April 30th, 2008 at 11:56 am →

Great article! All very good methods. Thanks for sharing your knowledge on the subject. Can never have too much protection from spam…

Why isn’t spam punishable by death?
Jared Smith responds:
Posted: April 30th, 2008 at 2:28 pm →

As you note, there are MANY methods of eliminating (or at least reducing) bot spam that have minimal or no impact on users and accessibility. All of your methods do have potential, though negligible, impact on users. I’ve documented several additional ways to block bots using scripting alone at http://www.webaim.org/blog/spam_free_accessible_forms/.

I find that the implementation of inaccessible CAPTCHAs could be avoided the vast majority of the time if just a few of these methods are implemented.
Mike Cherim responds:
Posted: April 30th, 2008 at 10:38 pm →

All of your methods do have potential, though negligible, impact on users.

How do you mean that, Jared? I would say that the input maxlength enforcement via PHP and the menu option matching have zero potential impact on accessibility, but I’m not sure if that is the type of user impact you are referring to.
Anthony Brewitt responds:
Posted: May 7th, 2008 at 10:46 am →

I think Mike is right in that the they do have zero impact on accessibility and also that these methods are both simple to implement and as always “Accessible”, Nice collection and yes I do hold you responsible for the hot or cold questions we all have to fill in now to leave a comment One thing that does worry me is the amount of “mail to” links you still see around company websites. If you want spam that will give you infinite amounts - if you must do this use an image or better still use a hot or cold question.
Anthony Brewitt responds:
Posted: June 13th, 2008 at 6:10 am →

I have seen some really bad anti spam methods including the popular captcha anti span methods; I remember one where you have to count the cats over the dogs, you have to click on a wav file to hear the numbers and most show a horrible distorted image that I cannot read and I have perfect vision, truly unaccessible and unusable.
Mike Cherim responds:
Posted: June 13th, 2008 at 8:42 am →

Ah, the Catz method…. that is bad lol.
Chris Stewart responds:
Posted: June 30th, 2008 at 1:21 pm →

Have you ever thought of doing up a coldfusion version of this anti-spam method? That would be seriously cool.
Mike Cherim responds:
Posted: June 30th, 2008 at 6:51 pm →

I don’t “know” cold fusion, Chris, but I’m guessing the syntax is very similar and the functions exist. Thus any of these could be applied to a CF form. I’m guessing. Plans for my v3 script are to system modules for it so it’ll be able to run on a variety of platforms such as WordPress. I haven’t had the time to take it to the next level though. As far as converting it to CF, that would be tricky as the engine is PHP. But the individual operations I described above could be converted into CF I’m sure and a form built from the ground up.
Chris Stewart responds:
Posted: July 1st, 2008 at 7:36 am →

I put something together yesterday in CF based on the ideas I read in this article. Let me sum up what happens here and tell me if I am right. I have a a form that contains the text box and question is it hot or is it cold. I pass whatever is entered as an answer to the page that processes the form and I check to see if that form field is anything other than ‘hot’. If it is hot I run the SQl to insert the comment into the DB. If not I display the error message and stop all further actions.

Is there anything else happening behind the scenes that isn’t covered in this article?
Mike Cherim responds:
Posted: July 1st, 2008 at 8:20 am →

If the question is answered incorrectly you should just halt script processing and echo the error to the user. No DB involvement. The script does have other security such as regex matching the email, sanitizing the inputs to prevent header injection, and a few other things. The scripts in this article are just for the spam side of things.
Chris Stewart responds:
Posted: July 1st, 2008 at 8:44 am →

Yes that is what I did. If the answer is answered incorrectly I stop all processing and display an error message. I will look into some of the other security additions you mentioned.

Thanks a lot! Great site. Very interesting articles.
العاب responds:
Posted: August 5th, 2008 at 7:59 am →

This number is defined and physically limited by the use of the maxlength attribute in the input
Mike Cherim responds:
Posted: August 5th, 2008 at 9:07 am →

This number is defined and physically limited by the use of the maxlength attribute in the input

But that alone can be circumvented so backing up that maxlength via script is necessary for security.
John responds:
Posted: August 26th, 2008 at 11:22 am →

Great ideas! I’m new to php applications, but instead of writing your error in the php validation
“Only ‘bots should be able to trigger this error so you can let ‘er rip, so to speak, in a colorfully-word error message, if that happens to bring you joy”
Perhaps its possible to send them off to remote parts of the web using {header(”Location:http://xyz.pdq”);exit();} , rather than let them try again to have a successful form entry. I suppose the bots can hit the back button…. a never ending loop of headers might tie them up as well.
John responds:
Posted: August 26th, 2008 at 12:01 pm →

” never ending loop of headers might tie them up as well..
..testing on my server ie 6 goes into endless loop, firefox does not just gives message, ie7 goes through one round then errors. Seems best to forward to a known site or loop of bogus forms.
Mike Cherim responds:
Posted: August 26th, 2008 at 12:48 pm →

It’s not a bad idea, John. Might not be fair to the recipient web site though. Moreover, location doesn’t work on all servers so its lack of even support can be problematic for a distributed script.
SunnyFrimley responds:
Posted: September 18th, 2008 at 4:01 am →

Doesn’t that honeypot approach cause problems for people with screen readers? I only ask because you seem keen on accessibility.
Mike Cherim responds:
Posted: September 18th, 2008 at 8:02 am →

That input, like the others, uses a proper label stating what it is so to the blind it’s not an issue. For those who keyboard it can be slightly confusing as the cursor disappear for one tab but they back on track quickly. In newer build I suggest offsetting that to the left only so the keyboarder won’t just back to the top — which makes it more confusing. I suppose there is a small trade off.
Brian Moreau responds:
Posted: February 11th, 2009 at 5:22 pm →

Ok I have a few sites with web forms and most from time to time have suffered from spam bot attacks.
I assume the bots somehow edit the $to field even if it is preset in your mail function in php thus.

$toEmail = “youremailaddress@yourdomain.com”;
mail ($toEmail, $Subject, $message, $headers);

I wonder if anyone can explain how this is done?
In the mean time I have adopted the honeypot method and the trap is a hidden field as I did not want any extra fields on display.

I have written my checking script as below and if an error is set the mail wont send.
This is assuming the spam bot will attempt to fill in the hidden field.

if ($name != “”) {
# this is a honeypot trap
$error = “SPAMMER BOT”;
}

Hope this works.
Mike Cherim responds:
Posted: February 11th, 2009 at 5:50 pm →

What usually happens is bots will send extra headers so the mail will send to more than the recipient. This is one type of header injection. The honeypot is one method of preventing the inputted form from completing the processing and sending mail. I am surprised you have $name as your honeypot input, but perhaps that was just not written as is. I’m sure you’ll find out soon enough on whether it works for you or not.

One thing, make sure the input is hidden by CSS, not hidden by input type as the bots will ignore it. Also, you might want to make a friendlier, more revealing error for human users to understand what went wrong. Some traps can only be triggered by bots so you can be more direct, or even nasty, but this is not one of those trap types.

Sorry. Comments are closed.

Beast-Blog.com Protecting Forms from Spam ‘Bots