A Guide to Writing Secure Themes – Part 3: Sanitization

In this part, we’re going to look at another technique to ensure that input is secure before using it in your code.

The difference between validation and sanitization

In the second part of this series, we talked about validation. When validating data, you are looking for certain criteria in the data. Or simply put, you’re saying “I want the data to have this, this, and this”.

Sanitization is different, because it is about removing all the harmful elements from the data. In essence you’re saying “I don’t want the data to have this, this, and this”.

But the difference is more than just conceptual. With validation, we store the data once we have verified it’s valid. If not, we discard it.

With sanitization, we take the data, and remove everything we don’t want. This means that we might change the data during the sanitization process. So in the case of user input, it is not guaranteed that all the input is kept.

So it’s important that you choose the right sanitization functions, to keep the data intact.

We’re going to look at seven often used sanitization functions provided by WordPress. For each function we’ll look at what it removes, as well as its use cases.

WordPress sanitization functions

sanitize_text_field()

The main usage of sanitize_text_field() function is to sanitize the data provided by text input fields in forms. But it’s useful for sanitizating any kind of data that you want to be plain text.

sanitize_text_field() applies the following modifications to the data:

  • Removes all tags.
  • Removes whitespace from the start and end of the string.
  • Removes extra whitespace (more than a single space) between words.
  • Removes tabs and line breaks.
  • Converts single < characters into an HTML entity.
  • Removes any invalid UTF–8 characters.
  • Removes % encoded octets.

Data passed through sanitize_text_field() is safe for storage in the database. You can use it with any of the high level functions in WordPress for saving data to the database, like for example update_post_meta().

<?php
if ( ! empty( $_POST['wptrt-meta-box-data'] ) ) {
    update_post_meta( $post_id, 'wptrt-meta-box-data', sanitize_text_field( $_POST['wptrt-meta-box-data'] ) );
}
?>

sanitize_text_field() can also clean arguments passed to WordPress or custom functions that expect plain text input. In this context, other validation steps might be needed, but making sure the data is valid plain text is a good first step.

absint()

absint() is a useful function for sanitizing IDs.

WordPress uses IDs to identify posts, terms, comments, users, etc. An ID needs to be an absolute integer, meaning a whole number that’s positive.

absint() is a wrapper function for two PHP functions: intval() turns the data into an integer, and abs() makes sure that it is an absolute value.

<?php
$post_id = abs( intval( $_POST['id'] ) ); // PHP functions.
$post_id = absint( $_POST['id'] );        // WordPress function that acts as a shortcut.
?>

Integers are safe to use in any context. When you pass invalid data–like a text string–to absint(), the return is most likely a 0. As the function internally converts the data into an integer, the rules of integer casting apply.

In MySQL, IDs start at 1. If your code relies on the sanitized ID, you can check that the sanitized data does not equal 0 before proceeding.

<?php
$post_id = absint( $_POST['id'] );

if ( 0 === $post_id ) {
    return;
}

// Use $post_id to retrieve a post, or do something else.
[…]
?>

If you need to sanitize an integer that can be negative or positive, use the PHP function intval(). It will only cast the data to an integer.

esc_url_raw()

esc_url_raw() sanitizes URLs for safe storage in a database by stripping undesired characters and verifying the URL protocol.

The function accepts two arguments: the URL to clean, as well as an optional array of allowed protocols. URLs that don’t use the whitelisted protocol(s) will be discarded.

So if you only want to save URLs that start with https://, you can call the function like this:

<?php $clean_url = esc_url_raw( $url, array( 'https' ) ); ?>

Keep in mind that relative URLs starting with a /, #, or ?, as well as file names ending with .php will not be discarded by esc_url_raw(). So if you need an absolute URL to a website, you need to put additional checks into place.

sanitize_email()

The sanitize_email() function performs a number of checks to detect invalid email address formats, and strips undesired characters.

It returns an empty string when the basic validity checks fail. If the email address has the right format, the sanitized address is returned.

sanitize_file_name()

The sanitize_file_name() function applies the following modifications to the data:

  • Removes special characters that are illegal in filenames on various operating systems.
  • Removes special characters that would require escaping when interacting with the file through the command line.
  • Replaces spaces and consecutive dashes with a single dash.
  • Removes periods, dashes, and underscores from the beginning and the end of the file name.
  • Adds an underscore to intermediate extensions that are not whitelisted.

sanitize_file_name() only handles sanitizing the name of the file.

It doesn’t make sure that the name is unique, you would need to use wp_unique_filename() for that.

While it handles intermediate extensions, it is not concerned with the main extension of the file. As an example, file.exe.exe will be transformed into file.exe_.exe, because .exe is not an allowed extension. file.exe will not be modified though.

You would need to use wp_check_filetype() to verify that the extension of the file is allowed on the system. The function returns an array with two keys: ext and type. Both will be set to false if the filetype is not part of the allowed MIME types.

sanitize_key()

The sanitize_key() function is useful to deal with data that needs to be in slug form.

Slugs can only be composed of lowercase alphanumeric characters (characters from a to z and numbers from 0 to 9), dashes (-) and underscores (_). Slugs are safe to use in any context.

Imagine that you have a theme option that allows users to enter a tag used to display featured posts in a slider.

<?php
$tag = sanitize_key( $_POST['featured-tag'] );
?>

Users will enter the tag into a text field, so using sanitize_text_field() would also be correct. In this case, using sanitize_key() is preferred, because it removes more unwanted data.

In addition, you will most likely query for the posts displayed in the slider using the tag slug. With the right sanitization function, you ensure that the data is a valid argument to pass to WP_Query.

sanitize_title()

The sanitize_title() function turns post titles into their slug form.

To do this, sanitize_title():

  • Removes PHP and HTML tags.
  • Removes accents.
  • Replaces spaces and periods with dashes.

This function is useful when you need to query for a post by name. You can safely pass the sanitized data to the name argument in WP_Query.

This concludes our look at some of WordPress’ sanitization functions. In the next section, we’re briefly going to touch on the sanitization functions provided by the PHP language itself.

PHP sanitization functions

For sanitization, you can use the same functions that we have discussed in the second part on validation. They are:

  • filter_input(): Retrieves an external variable (from $_GET, $_POST, $_SERVER,…) and applies the specified filter.
  • filter_input_array(): Works the same as filter_input(), but allows multiple values to be retrieved with one call.
  • filter_var(): Filters the variable passed as an argument.

When using these functions, you need to indicate a filter to use. The sanitization filters can be combined with flags to achieve a specific behavior.

As always, make sure to read the documentation carefully. Because only the combination of the right filter, with the right flags, and the right options makes sure that all invalid data is removed.

Conclusion

In this part of our series, we have seen what sanitization is, and what WordPress and PHP functions you can use.

In the next part, we are going to see how to put all the things we have seen so far into practice when dealing with post meta data, custom widget settings, as well as Customizer and the Settings API.

If you have any specific use cases you’re wondering about, please let me know in the comments, and we may look at this during the next part.

#writing-secure-themes