Input Validation
One of the most effective security techniques, though time consuming at first, is user input validation. Ensuring that the input being processed matches your expectations can go a long way toward preventing the exploitation of unnoticed security holes.
Overview
Verify GET, POST, and cookie data has not been tampered with or contains data that is expected or within an expected range. For example, when collecting an id number from a GET variable, the following line of code would guarantee the variable is an integer.
$id = (int)$_GET['id'];
Verifying string data is more difficult and usually requires the use of regular expressions. The following statement guarantees that the string is a file name that beings with lowercase letters and ends in ".html".
if ( ereg( "^[a-z]+\.html$", $id ) ) { echo "Good!"; } else { die( "Try hacking somebody else's site." ); }
More regular expression patterns can be found at http://www.regexplib.com/DisplayPatterns.aspx. It is also a good idea to perform input validation on the client side using a language such as JavaScript. This helps reduce server overhead. However, client side validation alone is easily circumvented and cannot be trusted.
Build a library for implementing input validation and add new patterns as you discover them. The more extensive your library becomes, the easier it is to verify input in the future.
Escaping Strings
All input should be escaped by using the relevant escaping method (which depends on the resource your data will get into). The important thing is not where the data comes from (any source should be considered untrusted), but instead it is important to consider where it is going:
- If your input goes to HTML, use htmlentities(), htmlspecialchars()
- If it goes to HTML link as a URL, use urlencode(), rawurlencode()
- If it goes to MySQL, use mysql_real_escape_string()
- If it goes to PGSQL use pg_escape_string()
- If it goes to system call, use escapeshellcmd(), escapeshellarg()
If allowing HTML is important, restrict its use to proper form and simple tags. A more powerful way to strip HTML tags is to use PHP’s strip_tags function or the more flexible HtmlFilter library.
If the destination resource has no escaping function associated, create one!
Libraries
A useful library for easy input validation is PEAR's HTML_QuickForm.
You must programmatically set up the form, with statements like:
$form->addElement("text", "Username", "Please enter your username here:");
A number of predefined filters and rules is defined, which are checked on the serverside as well as on the client using JavaScript. They can be added with statements like:
$form->addRule("Username", "Username is required", 'required', null, 'client');
When such a form is submitted, you usually don't get the result from $_GET, $_POST and friends, but instead do:
$form->exportValue("Username");
This has the advantage that if you populated a list and someone injected additional values, those will be ignored by the above statement. It's not necessary anymore to check whether the user has edited your HTML form offline and submitted illegal values.