A Guide to Writing Secure Themes – Part 4: Securing Post Meta

The previous two parts, we have gotten a high level overview of how to validate and sanitize data.

In the next few parts this series, we’ll look at how we can apply these concepts in specific contexts, starting with post meta.

Post meta database structure

When dealing with any kind of data, we need to first understand how it is stored in the database, and what the schema is.

Post meta is stored in the wp_postmeta table. It acts as a key/value storage for additional information about particular posts.

I encourage you to look at the structure of the table, because it provides us with a number of important information:

  • The post_id is the fastest way to look up data.
  • Queries against the meta_key are possible, but slower.
  • The meta keys need to be unique for the theme, which means we will need to use a prefix.
  • Meta values that are not scalars will be stored in serialized form.
  • Queries against meta values should be avoided. You cannot query against serialized data, and queries against LONGTEXT columns are super slow.

So post meta is meant for storing little additional bits of information related to individual posts. It shouldn’t be used to store custom CSS snippets, as a taxonomy replacement, or as a bucket for storing content from (rich) text editors.

So remember: just because the flexibility provided by WordPress allows you to do something, that doesn’t mean that it is a good idea to do so.

Now that we have seen how the underlying data storage works, let’s look at the user interface for entering post meta.

Post meta user interface

Post meta is entered via elements in the interface called meta boxes. You are probably familiar with the code for adding a meta box to the post editing interface:

<?php
function wptrt_add_meta_box() {
    add_meta_box( 'wptrt-sample-meta-box',  esc_html__( 'WPTRT Sample Meta Box', 'wptrt' ), 'wptrt_print_meta_box', 'post' );
}
add_action( 'add_meta_boxes', 'wptrt_add_meta_box' );
?>

When you look at the code of the add_meta_box() function, you’ll see that it stores the passed arguments in the global $wp_meta_boxes array.

So our custom meta box gets added to the same array in which the WordPress Core meta boxes are stored. This array is then used by the do_meta_boxes() function. It calls our custom callback function, wptrt_print_meta_box(), to print the meta box onto the screen.

To save the data entered via the meta box, you have to write a custom saving function, and hook it to save_post.

<?php
function wptrt_save_meta_box_data( $post_id ) {
    // Handle saving here
}
add_action( 'save_post', 'wptrt_save_meta_box_data' );
?>

The save_post action gets fired after the post has been saved. At this point WordPress has done all the work it needs to save the post data that Core handles.

The hook passes in three variables: the ID of the saved post, the post object (an instance of WP_Post), and a boolean, indicating whether it’s an update or a new post.

By looking at the underlying architecture, we can dedicate a number of things:

  1. In order to avoid conflicts with other meta boxes or meta data, we need to prefix everything.
  2. WordPress has no specific API for handling data submitted through custom meta boxes. We have to write our own data handling function.
  3. The save_post action is a generic action, that passes none of the data that was entered into the post edit form. This means that we’ll have to grab our data directly from the $_POST superglobal.
  4. No origin or capability checks have been performed for the HTTP request from which we are retrieving the data.

We’ll now see how we reflect these points in our code.

Dealing with autosaves

Auto saves are automatically triggered by WordPress at set intervals. These autosaves are done via an asynchronous request (AJAX request). For the duration of the processing of the request, the interface is “frozen”, to keep the user from entering more data.

The more work the database has to do, the longer this saving process is going to take. In order to reduce the time that users are stuck waiting on the autosave to finish, we’re going to put a check in place that avoids saving the post meta during this event.

<?php
if ( defined( 'DOING_AUTOSAVE' ) && DOING_AUTOSAVE ) {
    return;
}

The DOING_AUTOSAVE constant is defined by WordPress in the wp_autosave() function. When it is defined, and when its value evaluates to true, we return.

Protecting against unwanted requests

As we have seen before, we retrieve the data to save directly from a $_POST request. In the current stage of our code, we cannot distinguish between valid requests, made intentionally by a user via the admin interface, and a forged request by a malicious third party.

To protect against this, we use a nonce, a number used once. We add this information to the form, so that it gets send along with all the other data. We can then verify in the received $_POST data that the nonce is the same as the one we added to the form, and as such validate the request.

We will look at nonces in detail in a later part of this series. For now let’s add a nonce to the form with the wp_nonce_field() function.

<?php wp_nonce_field( 'wptrt-post-meta-box-save', 'wptrt-post-meta-box-nonce' ); ?>

The wp_nonce_field() function accepts four arguments, but we’re only using the first two: $action and $name.

$action refers to the context in which the data is generated. We’re going to use wptrt-post-meta-box-save here, to show that this data originates from the action of saving a meta box on the post screen.

The $name is the name of the hidden input field in the form that contains the nonce. We’re going to use this to retrieve the nonce later from the $_POST data.

As you can see, both the action and the name are prefixed, and unique to the situation. We’re going to see later why this is important, but please take note of this as a best practice.

Now let’s add a nonce check into our post meta saving function.

<?php
if ( ! isset( $_POST['wptrt-post-meta-box-nonce'] ) && ! wp_verify_nonce( $_POST['wptrt-post-meta-box-nonce'] ) ) {
    return;
}
?>

First we check whether the nonce data has been transmitted as part of the $_POST request data. Then we use wp_verify_nonce() to verify it.

If the nonce isn’t set, or if the wp_verify_nonce() function returns false, we return.

Verifying access rights

So far, we have completed two tasks: avoid saving data on autosaves, and ensuring that the data we work with was issued via the admin of the site.

But we haven’t verified whether the user that has entered the data has the right to modify post meta data. To check this, we’re going to use the built-in roles and capabilities provided by WordPress.

We’re going to look at this in more detail later, but for now, let’s try and find out what capability we would need to check.

Our post meta box is located on the Edit Post screen of the admin. So users should only be able to see our meta box when they can edit posts on the site.

That’s a good start, but we want to make sure that the user can edit the specific post for which we are saving the data. This is because there are many instances in which users have different access rights to different posts. Therefore we need to check for the rights on this particular post.

<?php
if ( ! current_user_can( 'edit_post', $post_id ) ) {
        return;
}
?>

For checking the user’s capabilities, we use the current_user_can() function, and the edit_post capability. Since the capability refers to a single post, we pass in the $post_id variable, which we passed into our custom saving function, and which is provided by the save_post hook.

With this code in place, we can start the saving process. But before we get to that, let’s first look at how to work efficiently with post meta.

Working efficiently with post meta

There are a couple of things to keep in mind when working with post meta:

  1. Don’t save default values. Default values should be handled in the code, not via the database.
  2. Combine related data, separate individual data. When you repeatedly retrieve an array from post meta, only to use a single element, you would better use an individual key. On the other hand, when your code is littered with calls to get_post_meta() for little bits, you’d better consolidate those calls by using a single key.
  3. Separate related data by type. When storing related data, make sure not to mix different data types in the same key. Doing so would make things unnecessarily difficult to manage.

After lots of preparatory work, let’s get down to business: saving meta data.

Individual checkboxes

Let’s start off by looking at the simplest case: individual checkboxes. These are checkboxes used for handling features that are not related. Each checkbox is therefore saved with an individual key, to make the data easier to retrieve and use in the theme.

Here is the code for adding a checkbox. We will place this code inside our wptrt_print_meta_box() function.

<p>
    <input type="checkbox" id="wptrt-individual-checkbox" name="wptrt-individual-checkbox" value="1" <?php checked( get_post_meta( get_the_ID(), 'wptrt-individual-checkbox', true ) ); ?> />
    <label for="wptrt-individual-checkbox"><?php echo esc_html__( 'Individual Checkbox', 'wptrt' ); ?></label>
</p>

Note the use of the checked() function, which is part of a series of functions for working with inputs.

The code for saving the checkbox is straightforward:

<?php
if ( ! isset( $_POST['wptrt-individual-checkbox'] ) && get_post_meta( $post_id, 'wptrt-individual-checkbox', true ) ) {
        delete_post_meta( $post_id, 'wptrt-individual-checkbox' );
} else {
        update_post_meta( $post_id, 'wptrt-individual-checkbox', 1 );
}
?>

Here are the steps we take:

  • We verify first whether the wptrt-individual-checkbox key exists in the $_POST data. This is because checkboxes don’t submit any data if they are not checked.
  • If the checkbox is not checked, and we have a saved post meta value that evaluates to true, we delete the post meta data.
  • If the key exists, we save a 1 to the database with add_post_meta(). The 1 is hardcoded, so no data from the request will be saved.
  • The 1 is used because it gets evaluated to true due to PHP’s type comparison. We use an integer, because the data gets stored as a string, which is not a good fit for booleans.

If we want to know in our theme whether the user has checked the checkbox or not, we verify whether the retrieved data evaluates to true.

<?php
if ( get_post_meta( get_the_ID(), 'wptrt-individual-checkbox', true ) ) {
    // Checkbox was checked
} else {
    // Checkbox was not checked
}
?>

So the saved 1 evaluates to true. If no post meta was saved, get_post_meta() will return an empty string, which evaluates to false.

Individual text fields

Other individual fields, like for example a single text field, can be dealt with in a similar manner.

Let’s add a text input to our post meta box:

<p>
        <label for="wptrt-individual-text-field"><?php echo esc_html__( 'Individual Text Field', 'wptrt' ); ?></label>
        <input type="text" id="wptrt-individual-text-field" name="wptrt-individual-text-field" value="<?php echo esc_attr( get_post_meta( get_the_ID(), 'wptrt-individual-text-field', true ) ); ?>" />
</p>

This is how we would save this data:

<?php
if ( empty( $_POST['wptrt-individual-text-field'] ) ) {
        if ( get_post_meta( $post_id, 'wptrt-individual-text-field', true ) ) {
            delete_post_meta( $post_id, 'wptrt-individual-text-field' );
        }
} else {
        update_post_meta( $post_id, 'wptrt-individual-text-field', sanitize_text_field( $_POST['wptrt-individual-text-field'] ) );
}
?>

As you can see, the code is similar to the checkbox example, with a few key differences:

  • empty() is used instead of isset(). This is because text fields return an empty string when they don’t contain any data.
  • We separated the checks for the existence of submitted data and saved data into two different conditionals. This is because the text field might be empty and no data might be saved.
  • Instead of add_post_meta(), we use update_post_meta(). The update_post_meta() function either adds post meta or updates existing post meta depending on the case.
  • Since we are using data from the request, we are using sanitization to make sure that the data is secure before saving.

Checkbox groups

Checkbox groups are checkboxes that are related together. A common example for such a group would be options that allow to hide certain elements related to individual posts, such as the date, author, and the categories.

First, let’s print the markup for these three checkboxes:

<?php $hide_elements = (array) get_post_meta( get_the_ID(), 'wptrt-hide-post-element', true ); ?>
<p>
        <input type="checkbox" id="wptrt-hide-post-date" name="wptrt-hide-post-element[]" value="date" <?php checked( in_array( 'date', $hide_elements, true ) ); ?> />
        <label for="wptrt-hide-post-date"><?php echo esc_html__( 'Hide Date', 'wptrt' ); ?></label>
</p>
<p>
        <input type="checkbox" id="wptrt-hide-post-author" name="wptrt-hide-post-element[]" value="author" <?php checked( in_array( 'author', $hide_elements, true ) ); ?> />
        <label for="wptrt-hide-post-author"><?php echo esc_html__( 'Hide Author', 'wptrt' ); ?></label>
</p>
<p>
        <input type="checkbox" id="wptrt-hide-post-categories" name="wptrt-hide-post-element[]" value="categories" <?php checked( in_array( 'categories', $hide_elements, true ) ); ?> />
        <label for="wptrt-hide-post-categories"><?php echo esc_html__( 'Hide Categories', 'wptrt' ); ?></label>
</p>

We will save the data from these checkboxes under a single key, in form of an array.

As we have seen, get_post_meta() returns an empty string when no post meta data exists. We therefore use type casting to ensure that we are always dealing with an array.

As we are dealing with an array, we use in_array() to determine whether a checkbox needs to be checked or not. It will return true or false, which the checked() function then will use to print the correct markup.

Speaking of markup, all the input elements have the same name: wptrt-hide-post-element[]. This ensures that all the submitted data for these checkboxes will be provided as an array stored under the wptrt-hide-post-element key in the $_POST data.

Since the user only can select from the options we provide in the interface, we will use validation to secure the data.

<?php
if ( ! isset( $_POST['wptrt-hide-post-element'] ) ) {
        if ( get_post_meta( $post_id, 'wptrt-hide-post-element', true ) ) {
            delete_post_meta( $post_id, 'wptrt-hide-post-element' );
        }
} else {
        $safe_hide_post_element = array();

        foreach ( $_POST['wptrt-hide-post-element'] as $element ) {
            if ( in_array( $element, array( 'date', 'author', 'categories' ), true ) ) {
                $safe_hide_post_element[] = $element;
            }
        }

        if ( ! empty( $safe_hide_post_element ) ) {
            update_post_meta( $post_id, 'wptrt-hide-post-element', $safe_hide_post_element );
        }
}
?>

We will not talk about the first conditional in the code, since it is the same as we used in the previous example.

Instead let’s look at what happens when $_POST['wptrt-hide-post-element'] is set, meaning the user has checked at least one checkbox:

  • First we created a temporary array called $safe_hide_post_element. This is where we will save all valid data. The naming is very explicit, to clarify that this variable only holds secure data.
  • Next we loop over the array contained in $_POST['wptrt-hide-post-element'] and compare each entry against a list of possible values. Valid entries are then stored inside the $safe_hide_post_element array.
  • Finally we check whether the $safe_hide_post_element array contains any entries. This is to avoid saving an empty array in case the $_POST data did not contain any valid options. This array is then saved.

In the theme, you can then retrieve the data, and use in_array() to determine whether to display an element of the post or not:

<?php
$hide_elements = (array) get_post_meta( get_the_ID(), 'wptrt-hide-post-element', true );

if ( ! in_array( 'date', $hide_elements ) ) {
        echo '<p>' . esc_html__( 'Date:', 'wptrt' ) . esc_html( get_the_date() ) . '</p>';
}

if ( ! in_array( 'author', $hide_elements ) ) {
        echo '<p>' . esc_html__( 'Author:', 'wptrt' ) . esc_html( get_the_author() ) . '</p>';
}

if ( ! in_array( 'categories', $hide_elements ) ) {
        echo '<p>' . esc_html__( 'Categories:', 'wptrt' ) . '</p>';
        the_category();
}
?>

Do not repeat yourself checkbox groups

The code we have so far works well, but it is a bit repetitive in some places. If you have more complex requirements, a little bit more abstraction would be helpful to reduce repetitive code.

Let’s implement a feature that allows users to choose their favorite colors. We will need a key and a text with the color name for each option. Let’s implement a function that returns the options:

<?php
function wptrt_get_favorite_color_options() {
    return array(
        'blue'   => __( 'Blue', 'wptrt' ),
        'red'    => __( 'Red', 'wptrt' ),
        'yellow' => __( 'Yellow', 'wptrt' ),
    );
}
?>

We can then use this function inside a loop to print the checkboxes.

<?php
$favorite_colors = (array) get_post_meta( get_the_ID(), 'wptrt-favorite-color', true );
foreach ( wptrt_get_favorite_color_options() as $option => $text ) :
?>
        <p>
            <input type="checkbox" id="wptrt-favorite-color-<?php echo esc_attr( $option ); ?>" name="wptrt-favorite-color[]" value="<?php echo esc_attr( $option ); ?>" <?php checked( in_array( $option, $favorite_colors, true ) ); ?> />
            <label for="wptrt-favorite-color-<?php echo esc_attr( $option ); ?>"><?php echo esc_html( $text ); ?></label>
        </p>
<?php endforeach; ?>

Next, we need to modify our saving code:

<?php
if ( ! isset( $_POST['wptrt-favorite-color'] ) ) {
        if ( get_post_meta( $post_id, 'wptrt-favorite-color', true ) ) {
            delete_post_meta( $post_id, 'wptrt-favorite-color' );
        }
} else {
        $safe_favorite_color = array();

        foreach ( $_POST['wptrt-favorite-color'] as $color ) {
            if ( array_key_exists( $color, wptrt_get_favorite_color_options() ) ) {
                $safe_favorite_color[] = $color;
            }
        }

        if ( ! empty( $safe_favorite_color ) ) {
            update_post_meta( $post_id, 'wptrt-favorite-color', $safe_favorite_color );
        }
}
?>

With this setup, every time you add or remove an entry in the array returned by wptrt_get_favorite_color_options(), the display code and saving code will take this change into account.

Conclusion

By now, you should have a pretty good idea on how to safely store post meta. Feel free to copy the code samples and play around with them to gain more experience. You can find the entire code in this tutorial on Github.

In the next part, we’ll look at how to deal with custom widget settings.

#writing-secure-themes

A Guide to Writing Secure Themes – Part 3: Sanitization

In this part, we’re going to look at another technique to ensure that input is secure before using it in your code.

The difference between validation and sanitization

In the second part of this series, we talked about validation. When validating data, you are looking for certain criteria in the data. Or simply put, you’re saying “I want the data to have this, this, and this”.

Sanitization is different, because it is about removing all the harmful elements from the data. In essence you’re saying “I don’t want the data to have this, this, and this”.

But the difference is more than just conceptual. With validation, we store the data once we have verified it’s valid. If not, we discard it.

With sanitization, we take the data, and remove everything we don’t want. This means that we might change the data during the sanitization process. So in the case of user input, it is not guaranteed that all the input is kept.

So it’s important that you choose the right sanitization functions, to keep the data intact.

We’re going to look at seven often used sanitization functions provided by WordPress. For each function we’ll look at what it removes, as well as its use cases.

WordPress sanitization functions

sanitize_text_field()

The main usage of sanitize_text_field() function is to sanitize the data provided by text input fields in forms. But it’s useful for sanitizating any kind of data that you want to be plain text.

sanitize_text_field() applies the following modifications to the data:

  • Removes all tags.
  • Removes whitespace from the start and end of the string.
  • Removes extra whitespace (more than a single space) between words.
  • Removes tabs and line breaks.
  • Converts single < characters into an HTML entity.
  • Removes any invalid UTF–8 characters.
  • Removes % encoded octets.

Data passed through sanitize_text_field() is safe for storage in the database. You can use it with any of the high level functions in WordPress for saving data to the database, like for example update_post_meta().

<?php
if ( ! empty( $_POST['wptrt-meta-box-data'] ) ) {
    update_post_meta( $post_id, 'wptrt-meta-box-data', sanitize_text_field( $_POST['wptrt-meta-box-data'] ) );
}
?>

sanitize_text_field() can also clean arguments passed to WordPress or custom functions that expect plain text input. In this context, other validation steps might be needed, but making sure the data is valid plain text is a good first step.

absint()

absint() is a useful function for sanitizing IDs.

WordPress uses IDs to identify posts, terms, comments, users, etc. An ID needs to be an absolute integer, meaning a whole number that’s positive.

absint() is a wrapper function for two PHP functions: intval() turns the data into an integer, and abs() makes sure that it is an absolute value.

<?php
$post_id = abs( intval( $_POST['id'] ) ); // PHP functions.
$post_id = absint( $_POST['id'] );        // WordPress function that acts as a shortcut.
?>

Integers are safe to use in any context. When you pass invalid data–like a text string–to absint(), the return is most likely a 0. As the function internally converts the data into an integer, the rules of integer casting apply.

In MySQL, IDs start at 1. If your code relies on the sanitized ID, you can check that the sanitized data does not equal 0 before proceeding.

<?php
$post_id = absint( $_POST['id'] );

if ( 0 === $post_id ) {
    return;
}

// Use $post_id to retrieve a post, or do something else.
[…]
?>

If you need to sanitize an integer that can be negative or positive, use the PHP function intval(). It will only cast the data to an integer.

esc_url_raw()

esc_url_raw() sanitizes URLs for safe storage in a database by stripping undesired characters and verifying the URL protocol.

The function accepts two arguments: the URL to clean, as well as an optional array of allowed protocols. URLs that don’t use the whitelisted protocol(s) will be discarded.

So if you only want to save URLs that start with https://, you can call the function like this:

<?php $clean_url = esc_url_raw( $url, array( 'https' ) ); ?>

Keep in mind that relative URLs starting with a /, #, or ?, as well as file names ending with .php will not be discarded by esc_url_raw(). So if you need an absolute URL to a website, you need to put additional checks into place.

sanitize_email()

The sanitize_email() function performs a number of checks to detect invalid email address formats, and strips undesired characters.

It returns an empty string when the basic validity checks fail. If the email address has the right format, the sanitized address is returned.

sanitize_file_name()

The sanitize_file_name() function applies the following modifications to the data:

  • Removes special characters that are illegal in filenames on various operating systems.
  • Removes special characters that would require escaping when interacting with the file through the command line.
  • Replaces spaces and consecutive dashes with a single dash.
  • Removes periods, dashes, and underscores from the beginning and the end of the file name.
  • Adds an underscore to intermediate extensions that are not whitelisted.

sanitize_file_name() only handles sanitizing the name of the file.

It doesn’t make sure that the name is unique, you would need to use wp_unique_filename() for that.

While it handles intermediate extensions, it is not concerned with the main extension of the file. As an example, file.exe.exe will be transformed into file.exe_.exe, because .exe is not an allowed extension. file.exe will not be modified though.

You would need to use wp_check_filetype() to verify that the extension of the file is allowed on the system. The function returns an array with two keys: ext and type. Both will be set to false if the filetype is not part of the allowed MIME types.

sanitize_key()

The sanitize_key() function is useful to deal with data that needs to be in slug form.

Slugs can only be composed of lowercase alphanumeric characters (characters from a to z and numbers from 0 to 9), dashes (-) and underscores (_). Slugs are safe to use in any context.

Imagine that you have a theme option that allows users to enter a tag used to display featured posts in a slider.

<?php
$tag = sanitize_key( $_POST['featured-tag'] );
?>

Users will enter the tag into a text field, so using sanitize_text_field() would also be correct. In this case, using sanitize_key() is preferred, because it removes more unwanted data.

In addition, you will most likely query for the posts displayed in the slider using the tag slug. With the right sanitization function, you ensure that the data is a valid argument to pass to WP_Query.

sanitize_title()

The sanitize_title() function turns post titles into their slug form.

To do this, sanitize_title():

  • Removes PHP and HTML tags.
  • Removes accents.
  • Replaces spaces and periods with dashes.

This function is useful when you need to query for a post by name. You can safely pass the sanitized data to the name argument in WP_Query.

This concludes our look at some of WordPress’ sanitization functions. In the next section, we’re briefly going to touch on the sanitization functions provided by the PHP language itself.

PHP sanitization functions

For sanitization, you can use the same functions that we have discussed in the second part on validation. They are:

  • filter_input(): Retrieves an external variable (from $_GET, $_POST, $_SERVER,…) and applies the specified filter.
  • filter_input_array(): Works the same as filter_input(), but allows multiple values to be retrieved with one call.
  • filter_var(): Filters the variable passed as an argument.

When using these functions, you need to indicate a filter to use. The sanitization filters can be combined with flags to achieve a specific behavior.

As always, make sure to read the documentation carefully. Because only the combination of the right filter, with the right flags, and the right options makes sure that all invalid data is removed.

Conclusion

In this part of our series, we have seen what sanitization is, and what WordPress and PHP functions you can use.

In the next part, we are going to see how to put all the things we have seen so far into practice when dealing with post meta data, custom widget settings, as well as Customizer and the Settings API.

If you have any specific use cases you’re wondering about, please let me know in the comments, and we may look at this during the next part.

#writing-secure-themes

A Guide to Writing Secure Themes – Part 2: Validation

Validation is a technique to ensure that input is secure before using it in your code.

When validating data, you are verifying that it corresponds to what the program needs. This only works if you have a list of criteria that you can check to determine that the data is valid.

Whitelisting

The simplest validation method is whitelisting. This only works when there is a precise set of possible values that the data can have.

Let’s look at how whitelisting can be used for validating a theme option controlling the position of the sidebar.

image

Here is the code that we used to create the setting and the control:

$wp_customize->add_setting( 'sidebar-position', array(
    'default'           => 'left',
    'sanitize_callback' => 'wptrt_validate_sidebar_position',
) );

$wp_customize->add_control( 'sidebar-position-control', array(
    'label'    => esc_html__( 'Sidebar Position', 'wptrt' ),
    'section'  => 'theme',
    'settings' => 'sidebar-position',
    'type'     => 'radio',
    'choices'  => array(
        'left'  => esc_html__( 'Left', 'wptrt' ),
        'right' => esc_html__( 'Right', 'wptrt' ),
) ) );

The user only has two choices: left or right. This means that in the wptrt_validate_sidebar_position(), we can determine whether the submitted option is one of the two possible values.

function wptrt_validate_sidebar_position( $sidebar_position ) {
    if ( in_array( $sidebar_position, array( 'left', 'right' ), true ) ) {
        return $sidebar_position;
    }
}

To do this, we use the in_array() PHP function. This function returns true when the needle, the submitted value for the position of the sidebar, is in the haystack, the list of possible positions.

The third parameter of the in_array() function is to enable strict type comparison. We pass true as an argument, to enable the strict checking. This is important, because in PHP loose type comparison can lead to unexpected results.

So whitelisting simply means that we compare the submitted data against a list of acceptable values. This works well for controls such as checkboxes, radio buttons, selects, and dropdowns.

But how can we validate data for which we don’t know the possible values? Let’s have a look at validating data according to a set of qualifications.

Qualifying data

When qualifying data, we try to find out whether it meets a precise set of criteria. Let’s look at an example of validating data.

Imagine that you have a meta box that allows users to enter a value for the width (in pixels) of the content area of a particular post. While not being a super useful feature in a theme, this example allows us to demonstrate the use of filter_input().

The filter_input() PHP function gets a variable and validates it. The function accepts four arguments: the type of input, the name of the variable to get, the filter (validation) to apply, and an optional array of options.

<?php
$content_area_width = filter_input(
                INPUT_POST,
                'content_area_width',
                FILTER_VALIDATE_INT,
                array( 'options' => array(
                    'default'   => 500,
                    'min_range' => 100,
                    'max_range' => 1000,
                ) )
            );
?>

Although the code for this function might seem verbose, it’s much shorter and clearer than writing it all out:

<?php
// Warning: This code does not work correctly.
if ( isset( $_GET['content_area_width'] ) && is_int( $_GET['content_area_width'] ) && $_GET['content_area_width'] >= 100  && $_GET['content_area_width'] <= 1000 ) {
    $content_area_width = $_GET['content_area_width'];
} else {
    $content_area_width = 500;
}
?>

You might wonder why there is a warning about this code not working. Seems to look good, right? The problem is that is_int( $_GET['content_area_width'] ) will always return false, so this code will always return 500.

This is because data retrieved from the $_GET and $_POST super globals is always of the type string. Using the filter_input() function allows us to get around this limitation of the PHP language.

Choosing the right qualifications

When validating data, it’s crucial that you choose the right set of qualifications, and express this correctly in the code.

Imagine that you have a Customizer setting in your theme for entering a link to a Twitter profile. You want to have a valid URL for this setting, so you use the filter_var() PHP function with the FILTER_VALIDATE_URL filter.

<?php
// Warning: Insecure code!
function wptrt_validate_twitter_profile_url( $url ) {
    return filter_var( $url, FILTER_VALIDATE_URL ) );
}
?>

The next thing you do is output the validated URL in your theme:

<?php
// Warning: Insecure code!
echo '<a href="' . $twitter_url . '">' . esc_html__( 'Twitter', 'wptrtp' ) . '</a>';
?>

In the four lines of code that we have seen so far, we have made two crucial mistakes:

  1. We trusted the filter_var() function to validate the URL to the Twitter profile.
  2. We didn’t escape the URL on output.

We are going to look at escaping in a later part of this series. For now let’s look at why the validation was too weak to be secure.

The problem is that if you enter javascript://test%0Aalert(321), this is a valid URL. As soon as a user would click on the Twitter link on the front end of the site, a Javascript dialog would appear.

We need to add additional checks to our function:

function wptrt_validate_twitter_profile_url( $url ) {
    if ( 0 !== strpos( $url, 'https://twitter.com/' ) ) {
        return;
    }

    return filter_var( $url, FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED ) );
}

This function now verifies that the data meets three qualifications:

  1. The URL starts with https://twitter.com/.
  2. The URL is valid according to the RFC 2396 standard.
  3. The URL has a path component (as in http://example.org/path).

Validation functions

WordPress validation functions

WordPress only has a couple of validation functions.

  • is_email(): Checks whether the data is a valid email address. The validation done by the function does not comply with the RFC 822 standard, and does not work with internationalized domain names.
  • wp_validate_boolean(): Despite the name, this function not only validates, but also sanitizes the data passed to it. So the return value will always be a boolean. You can use filter_var( $var, FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE ) as an alternative, as it returns NULL when the passed data is not valid.
  • sanitize_hex_color(): This actually a validation function, as it returns null if the color code isn’t valid. It is only available in the Customizer context, but it’s a small function so you can copy the code to your own validation function if needed.
  • sanitize_hex_color_no_hash(): The same as sanitize_hex_color() but for values without a leading #.

PHP validation functions

PHP offers a number of validation functions. As we have seen previously, using them can be a bit tricky. So make sure to read the documentation carefully, including the notes.

  • is_bool(): Returns true if the passed variable is of the type boolean.
  • is_float(): Returns true if the passed variable is of the type float.
  • is_int(): Returns true if the passed variable is of the type integer.
  • is_numeric(): Returns true if the passed variable contains a numeric value. Keep in mind that this encompasses all numeric values, so signs, hexadecimal, binary, and octal values are all valid.
  • strtotime(): Not a validation function strictly speaking, but can be used as such to validate dates. The function returns false if the passed data cannot be converted into a timestamp.

Next we have a family of functions that have been specifically designed to validate data.

  • filter_input(): Retrieves an external variable (from $_GET, $_POST, $_SERVER,…) and applies the specified filter.
  • filter_input_array(): Works the same as filter_input(), but allows multiple values to be retrieved with one call.
  • filter_var(): Filters the variable passed as an argument.

When using these functions, you need to indicate a filter to use. The validation filters can be combined with flags to achieve a specific behavior. Some filters also accept additional options.

It’s the combination of the right filter, with the right flags, and the right options that makes these functions do their work correctly.

Conclusion

Now that we have a solid grasp on how validating data works, we’ll look at sanitization in the next part of this series.

#writing-secure-themes

A Guide to Writing Secure Themes – Part 1: Introduction

As a developer, keeping your users secure should be your most important priority.

Having a theme available on WordPress.org is a huge responsibility, because security issues make every site running the theme potentially vulnerable.

This guide will give you an introduction to the techniques you can apply to write secure code.

The guide is broken up into parts to make it easier to read and apply. It contains everything I learned over the past three years while reviewing themes for WordPress.org, premium themes for WordPress.com, as well as themes and plugins for WordPress.com VIP.

Before we get to the techniques, let’s have a look at the principles of secure code.

Principles of secure code

Writing secure code is not about using a particular function, tool, or workflow. Those things change over time, with new development techniques emerging and new security issues arising.

The common element that connects all these things together is the state of mind of the developer. This mindset is based on three principles:

  1. Don’t assume anything. Only act on what you know for sure.
  2. Don’t trust any data. Consider data invalid and insecure until proven valid and secure.
  3. Don’t become complacent. Web technologies evolve, and so do best practices.

With these principles on our mind, let’s clarify the meaning of a few terms we’re going to use in this series.

Commonly used terms

Input and output

When we talk about input, this designates all the data that is given to our code.

The most prevalent use case is information entered by the user, for example into a form field or the browser address bar. But it also encompasses data retrieved from stored cookies or from external services, like the Twitter API.

Themes deal with this data in various ways. They might store it into the database, use it to retrieve data from the database, or display it to the user.

When we talk about displaying information, we use the word output. But output is not just what we see on the screen, it’s all the data provided by our code.

Imagine a PHP script that passes data to a Javascript script, such as data used to initialize a slider for example. In this case, the PHP outputs the data that is then used as input by the Javascript.

If your code connects to a REST API, the JSON data returned by the API is the output, that your code then uses as input.

Dynamic and static data

When we talk about dynamic or static data, this is not to be confused with the static keyword in PHP.

When we talk about static data, we designate data that cannot be changed except by changing the code. Here is an example:

<?php echo 'Hello World'; ?>

So when you read static, think of static HTML pages. These documents cannot return information that is not present in their source code.

Dynamic data on the other hand can be modified through different ways. For example:

<?php echo __( 'Hello World', 'wptrt' ); ?>

In this code sample, the __() translation function returns data. This data can be filtered, or modified by loading a translation.

What we are outputting is the return value of the function. Let’s look at this in more detail.

Return values

Return values is data provided by a function. In PHP you often see these return statements in functions:

<?php
function wptrt_add_numbers( $a, $b ) {
    return $a + $b;
}
?>

Functions can return all kinds of data. Currently in PHP there is no way to force a function to return a certain type of data.

This is important to keep in mind, because a lot of WordPress functions contain filters. So you can never be sure about the data that a certain function returns.

Now that we have seen the vocabulary, we’ll look at common attacks.

Common attacks

In order for you to secure your code, you need to understand how attacks work.

A good starting point is to read through the list of the Top 10 attacks in 2013, published by the Open Web Application Security Project (OWASP).

Google Application security also has a very good introduction to Cross-Site scription (XSS) attacks. You actually can test out these attacks in the browser.

If you are interested in specific attacks for WordPress, I recommend reading the Sucuri Blog.

Conclusion

This part should have provided you with a good overview of what security is, the related terminology, and the type of attacks encountered.

In the next part, we’re going to look at how you can protect against some of these attacks by validating data before use.

#writing-secure-themes