In this part, we’re going to look at another technique to ensure that input is secure before using it in your code.
The difference between validation and sanitization
In the second part of this series, we talked about validation. When validating data, you are looking for certain criteria in the data. Or simply put, you’re saying “I want the data to have this, this, and this”.
Sanitization is different, because it is about removing all the harmful elements from the data. In essence you’re saying “I don’t want the data to have this, this, and this”.
But the difference is more than just conceptual. With validation, we store the data once we have verified it’s valid. If not, we discard it.
With sanitization, we take the data, and remove everything we don’t want. This means that we might change the data during the sanitization process. So in the case of user input, it is not guaranteed that all the input is kept.
So it’s important that you choose the right sanitization functions, to keep the data intact.
We’re going to look at seven often used sanitization functions provided by WordPress. For each function we’ll look at what it removes, as well as its use cases.
WordPress sanitization functions
sanitize_text_field()
The main usage of sanitize_text_field()
function is to sanitize the data provided by text input fields in forms. But it’s useful for sanitizating any kind of data that you want to be plain text.
sanitize_text_field()
applies the following modifications to the data:
- Removes all tags.
- Removes whitespace from the start and end of the string.
- Removes extra whitespace (more than a single space) between words.
- Removes tabs and line breaks.
- Converts single
<
characters into an HTML HTML is an acronym for Hyper Text Markup Language. It is a markup language that is used in the development of web pages and websites. entity.
- Removes any invalid UTF–8 characters.
- Removes
%
encoded octets.
Data passed through sanitize_text_field()
is safe for storage in the database. You can use it with any of the high level functions in WordPress for saving data to the database, like for example update_post_meta()
.
<?php
if ( ! empty( $_POST['wptrt-meta-box-data'] ) ) {
update_post_meta( $post_id, 'wptrt-meta-box-data', sanitize_text_field( $_POST['wptrt-meta-box-data'] ) );
}
?>
sanitize_text_field()
can also clean arguments passed to WordPress or custom functions that expect plain text input. In this context, other validation steps might be needed, but making sure the data is valid plain text is a good first step.
absint()
absint()
is a useful function for sanitizing IDs.
WordPress uses IDs to identify posts, terms, comments, users, etc. An ID needs to be an absolute integer, meaning a whole number that’s positive.
absint()
is a wrapper function for two PHP PHP (recursive acronym for PHP: Hypertext Preprocessor) is a widely-used open source general-purpose scripting language that is especially suited for web development and can be embedded into HTML. http://php.net/manual/en/intro-whatis.php. functions: intval()
turns the data into an integer, and abs()
makes sure that it is an absolute value.
<?php
$post_id = abs( intval( $_POST['id'] ) ); // PHP functions.
$post_id = absint( $_POST['id'] ); // WordPress function that acts as a shortcut.
?>
Integers are safe to use in any context. When you pass invalid data–like a text string–to absint()
, the return is most likely a 0
. As the function internally converts the data into an integer, the rules of integer casting apply.
In MySQL MySQL is a relational database management system. A database is a structured collection of data where content, configuration and other options are stored. https://www.mysql.com/., IDs start at 1
. If your code relies on the sanitized ID, you can check that the sanitized data does not equal 0
before proceeding.
<?php
$post_id = absint( $_POST['id'] );
if ( 0 === $post_id ) {
return;
}
// Use $post_id to retrieve a post, or do something else.
[…]
?>
If you need to sanitize an integer that can be negative or positive, use the PHP function intval()
. It will only cast the data to an integer.
esc_url_raw()
esc_url_raw()
sanitizes URLs for safe storage in a database by stripping undesired characters and verifying the URL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org protocol.
The function accepts two arguments: the URL to clean, as well as an optional array of allowed protocols. URLs that don’t use the whitelisted protocol(s) will be discarded.
So if you only want to save URLs that start with https://
, you can call the function like this:
<?php $clean_url = esc_url_raw( $url, array( 'https' ) ); ?>
Keep in mind that relative URLs starting with a /
, #
, or ?
, as well as file names ending with .php
will not be discarded by esc_url_raw()
. So if you need an absolute URL to a website, you need to put additional checks into place.
sanitize_email()
The sanitize_email()
function performs a number of checks to detect invalid email address formats, and strips undesired characters.
It returns an empty string when the basic validity checks fail. If the email address has the right format, the sanitized address is returned.
sanitize_file_name()
The sanitize_file_name()
function applies the following modifications to the data:
- Removes special characters that are illegal in filenames on various operating systems.
- Removes special characters that would require escaping when interacting with the file through the command line.
- Replaces spaces and consecutive dashes with a single dash.
- Removes periods, dashes, and underscores from the beginning and the end of the file name.
- Adds an underscore to intermediate extensions that are not whitelisted.
sanitize_file_name()
only handles sanitizing the name of the file.
It doesn’t make sure that the name is unique, you would need to use wp_unique_filename()
for that.
While it handles intermediate extensions, it is not concerned with the main extension of the file. As an example, file.exe.exe
will be transformed into file.exe_.exe
, because .exe
is not an allowed extension. file.exe
will not be modified though.
You would need to use wp_check_filetype()
to verify that the extension of the file is allowed on the system. The function returns an array with two keys: ext
and type
. Both will be set to false
if the filetype is not part of the allowed MIME types.
sanitize_key()
The sanitize_key()
function is useful to deal with data that needs to be in slug form.
Slugs can only be composed of lowercase alphanumeric characters (characters from a to z and numbers from 0 to 9), dashes (-) and underscores (_). Slugs are safe to use in any context.
Imagine that you have a theme option that allows users to enter a tag used to display featured posts in a slider.
<?php
$tag = sanitize_key( $_POST['featured-tag'] );
?>
Users will enter the tag into a text field, so using sanitize_text_field()
would also be correct. In this case, using sanitize_key()
is preferred, because it removes more unwanted data.
In addition, you will most likely query for the posts displayed in the slider using the tag slug. With the right sanitization function, you ensure that the data is a valid argument to pass to WP_Query
.
sanitize_title()
The sanitize_title()
function turns post titles into their slug form.
To do this, sanitize_title()
:
- Removes PHP and HTML tags.
- Removes accents.
- Replaces spaces and periods with dashes.
This function is useful when you need to query for a post by name. You can safely pass the sanitized data to the name
argument in WP_Query
.
This concludes our look at some of WordPress’ sanitization functions. In the next section, we’re briefly going to touch on the sanitization functions provided by the PHP language itself.
PHP sanitization functions
For sanitization, you can use the same functions that we have discussed in the second part on validation. They are:
- filter_input(): Retrieves an external variable (from
$_GET
, $_POST
, $_SERVER
,…) and applies the specified filter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output..
- filter_input_array(): Works the same as
filter_input()
, but allows multiple values to be retrieved with one call.
- filter_var(): Filters the variable passed as an argument.
When using these functions, you need to indicate a filter to use. The sanitization filters can be combined with flags to achieve a specific behavior.
As always, make sure to read the documentation carefully. Because only the combination of the right filter, with the right flags, and the right options makes sure that all invalid data is removed.
Conclusion
In this part of our series, we have seen what sanitization is, and what WordPress and PHP functions you can use.
In the next part, we are going to see how to put all the things we have seen so far into practice when dealing with post meta Meta is a term that refers to the inside workings of a group. For us, this is the team that works on internal WordPress sites like WordCamp Central and Make WordPress. data, custom widget A WordPress Widget is a small block that performs a specific function. You can add these widgets in sidebars also known as widget-ready areas on your web page. WordPress widgets were originally created to provide a simple and easy-to-use way of giving design and structure control of the WordPress theme to the user. settings, as well as Customizer Tool built into WordPress core that hooks into most modern themes. You can use it to preview and modify many of your site’s appearance settings. and the Settings API An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways..
If you have any specific use cases you’re wondering about, please let me know in the comments, and we may look at this during the next part.
#writing-secure-themes