Shortcode Roadmap Draft Two

This is the second draft of the Shortcode API Roadmap. It describes in broad terms the changes that might take place across versions 4.4 through 4.7. This roadmap gives notice to plugin developers that significant changes in plugin design may be needed for compatibility with future versions of the Shortcode API. This roadmap also identifies steps taken to minimize the impact on plugin developers to allow most plugins to continue working with only small changes.

This roadmap is based on decisions made at the meeting in #feature-shortcode, as well as feedback on previous posts, and consultation with the core developers.

Our need for a roadmap arose from specific problems in the old code.  There are performance problems in parsing shortcodes, and we need to fix those problems with backward compatibility in mind.  Recent security patches illustrated the problem of not being proactive about security hardening.  Bloat in content filters is another big problem in itself.

Please comment on this new draft.  We will have another meeting next Wednesday at 17Z, which is 2015-10-07 1700. Trac tickets that were nominated for closure at the last meeting will be closed today, with references to this draft.

4.4 – Introduce Multiple Enclosures

The two items on the 4.4 Milestone help us move toward our goals of security hardening without breaking websites.  A new delimiter in the shortcode syntax will enable plugin authors and users to always place their HTML between the shortage tags rather than inside of them.  At the same time, the names of registered shortcodes will be slightly restricted by disallowing angle braces in shortcode names.  It should be possible to implement both of these changes immediately with no impact on existing content or plugins.

New Delimiter

A new addition to the shortcode syntax along with documentation of how it works will be created in the 4.4 development cycle.  Its purpose is to allow more than one HTML enclosure in a single shortcode. Use of this new delimiter is optional or as needed.

Enclosure Delimiter:  [parent:attr=]

Usage:  [shortcode] Content HTML [shortcode:name=] Attribute HTML [/shortcode]

Example:  [photo link_to="twentysixteen/"] Here is <b>my</b> caption. [photo:media=] <img src="00.twentysixteen-260x300.png" width="260" height="300" /> [/photo]

Formally:
    delimiter   =  begin code-name middle attr-name end
    begin       =  "["
    code-name   =  1*( ALPHA / DIGIT / unreserved / %x80-FD )
    middle      =  ":"
    attr-name   =  *( ALPHA / DIGIT / "-" / "_" )
    end         =  "=]"
    unreserved  =  "!" / "#" / "$" / "%" / "(" / ")" / "*" /
                   "+" / "," / "-" / "." / ";" / "?" / "@" / 
                   "^" / "_" / "{" / "|" / "}" / "~"

In the examples above, the “name” part of the new delimiter works the same way as an attribute name. The main difference when using this new style of attribute (the enclosure) is improved support for HTML inside the value text. One basic reason why this works better is because our HTML filters do not need to understand how to look in the middle of individual shortcode tags. All of the HTML is located between shortcode tags, making the shortcode and HTML codes easy to process separately.

Note the attribute names can still be blank, as in [parent:=] and we do not expect any differences with this usage.

One limitation we have to consider, and to avoid, is a conflict among nested shortcodes.  We expect this to be a rare situation because there was never core support for nesting shortcodes inside of shortcode attributes. However, there is still a concern about HTML usage for inner nesting with plugin callbacks. For example:

Old: [form-plugin][form-field1 start="<html>"][/form-plugin]

New: [form-plugin][form-field1][form-field1:start=]<html>[/form-field1][/form-plugin]

We can think of many better ways to write these shortcodes, but the main concern now is backward compatibility. Any new suggestions for this roadmap need to consider the example above. Please do bring to our attention any existing situations where “old” attributes could have problems when converted to the new enclosures syntax.

New Restriction on Shortcode Names

The < and > characters will no longer be allowed in the $tag parameter of add_shortcode(). There are already some implicit restrictions described in Codex. However, the API historically allowed any character in the name of registered shortcodes, even when the shortcode parser could not recognize them.  Starting in 4.4, attempting to register an invalid shortcode name will result in a helpful error message.

Formally:
    reserved     =  CTL / SP / short-delim / html-delim
    short-delim  =  "[" / "]" / "/"
    html-delim   =  "<" / ">" / "&"

Please note that these limitations were documented over a year ago and should have no impact on existing plugins.

One of the questions raised at the last meeting was whether or not we should place even more restrictions on shortcode names in later versions? This needs further discussion.

4.5 – Change the Filter Priorities

The two items on the 4.5 Milestone help us move toward our goals of streamlining the shortcode system.  Changing the content filter priorities to handle shortcodes before other filters will greatly reduce the complexity of those other filters.  Also, adding new functions for the conversion of some shortcode attributes into the new multiple enclosure format will give plugin authors a standardized way to work on compatibility for future versions.

Shortcodes Before Curly Quotes

Reducing complexity of content filters is a major goal for this roadmap.  In reviewing the current API, we find entire extra functions and highly complex regex patterns that are needed only because we are running several filters before processing shortcodes.

A major concern related to changes here is the impact of those other filters after the shortcodes are processed.  Development of the 4.5 API will follow several objectives to make this as seamless as possible.

Curly quotes, known as the wptexturize() function, will avoid shortcode output by default so that we do not cause unexpected changes for plugins.  This is very easy.  Wptexturize already includes HTML element avoidance logic.  We will internally create unique HTML tags or a similar placeholder system so that shortcodes remain unaffected after the change of filter priorities.  At the same time, we will be able to remove the extra code that was needed to look for shortcode tags.  New options will be added for shortcode registration.  Ideally, shortcodes will be able to turn on and turn off the curly quotes feature for each registered shortcode name.

Paragraphs, known as the wpautop() function, will avoid shortcode output by default for the same reasons.  The big difference here is that wpautop already has several known bugs related to code avoidance, and does not yet have the needed logic.  So this will be the larger challenge of the 4.5 Milestone.  It also creates great opportunities for us to focus on some old tickets and to improve this core code along with the shortcode system.  Shortcodes will be able to turn on and turn off the paragraphs feature for each registered shortcode name.

Yet to be decided: Decrease shortcode priority number, or increase priority number of other filters? Implement security context in API, but for post content only?

New Conversion Functions

One of the pain points for the future of shortcode plugins is the future restriction or elimination of HTML inside of old shortcodes.  To help plugins switch any Shortcode attributes to the multiple enclosure syntax, we will add some standard conversion functions.  Here is a rough outline of which conversion features will be offered.

convert shortcode attributes ( $text, $shortcode, $attributes ).  This will allow a plugin to specify one shortcode name and one or more attributes that need to be converted from the input text.

convert many shortcodes ( $text, $array ).  This will help plugins deal with multiple shortcodes by accepting arrays of attributes instead of just one at a time.

register posts conversion ( $array ).  This will allow plugins to easily register a WordPress cron that will scan all posts in small batches and convert the shortcodes in old content.  Related functions will be included.

Plugin authors have voiced some concerns about dealing with fields outside of posts.  We are hoping the basic conversion functions help convert any content.  If a specific feature seems missing here, please let us know.  Plugin authors will be expected to implement other field conversions as needed.

4.6 – Stop Saving HTML in Shortcode Attributes

The single item on the 4.6 Milestone helps us move toward our goals of hardening the shortcode system while avoiding urgent updates that would be even more problematic.

This is a significant change in the way shortcodes are processed when a new post is saved.  A new core filter will scan the shortcodes in each new post.  If a registered shortcode is found to contain angle braces inside of a shortcode tag, then the angle braces will be replaced with spaces automatically.  This will happen when any post is saved or edited.

Examples

Input 1:  [shortcode attr="Hello & it is <b>my</b> title"]
Output 1: [shortcode attr="Hello & it is  b my /b  title"]

Input 2:  [shortcode attr="<a href='https://wordpress.org/'>hi</a>"]
Output 2: [shortcode attr=" a href='https://wordpress.org/' hi /a "]

Input 3:  [shortcode compare=">"]
Output 3: [shortcode compare=" "]

Input 4:  [shortcode attr='<img title="&lt;">']
Output 4: [shortcode attr=' img title="&lt;" ']

Why spaces? This was a tough decision based on several problems. We need to stop saving angle braces inside of shortcodes because our HTML filters do not handle this correctly. Simply encoding the angle braces is not good because it encourages plugins to decode into HTML some content that was not filtered for safety. We also want to indicate that some content has changed, and simply stripping out the angle braces gives very ugly results. Replacing angle braces with spaces is therefore the least problematic strategy.

The expected impact on plugins is limited to those that have not followed our roadmap for the 4.4 Milestone and that attempt to save or insert any HTML inside of shortcode attributes. Plugins will not be affected if they do not save HTML in this way, or if they have updated to use the multiple enclosure syntax in new posts.

Plugins that have not followed our roadmap for the 4.5 Milestone could be impacted when a user attempts to edit an old post containing HTML inside of shortcode attributes. The planned solution is to avoid this problem by use of the conversion functions of 4.5. By 4.6, plugins that still allow HTML in saved shortcode attributes will be deemed insecure. We will not offer core solutions for editing shortcodes in old posts along with insecure plugins.

4.7 – Disallow HTML/Shortcode Attribute Mixing

The two items on the 4.7 Milestone help us move toward our goals of hardening the Shortcode API and avoiding unplanned security updates. Blocking HTML in shortcode attributes means old content must either be converted, or it will have to function without HTML. Blocking shortcodes in HTML attributes means old content must be converted or will stop working because these shortcodes are no longer recognized.

HTML in Shortcode Attributes

To fully enforce the separation of shortcodes and HTML codes, the shortcode API of version 4.7 would delete any shortcode tag or attribute that contains HTML elements.  Such tags would be treated as if the registered callback returned nothing; by displaying nothing where the shortcode was found.  Here are some examples of how this works:

==API 4.6==
Input:  [shortcode insert="fun"]
Output: My fun link is here.

Input:  [shortcode insert="<a>"]
Output: My <a> link is here.

Input:  [shortcode][shortcode:insert=]<a>[/shortcode]
Output: My <a> link is here.

==API 4.7==
Input:  [shortcode insert="fun"]
Output: My fun link is here.

Input:  [shortcode insert="<a>"]
Output:

Input:  [shortcode][shortcode:insert=]<a>[/shortcode]
Output: My <a> link is here.

We have the ability to do this gracefully because the HTML is assumed to be valid. Anyone still using unpaired angle braces inside of shortcode attributes can expect the shortcode to be ignored and displayed as HTML rather than deleted and replaced with nothing.

Shortcodes in HTML Attributes

To fully enforce the separation of shortcodes and HTML codes, the shortcode API of version 4.7 would also ignore any shortcode tag or attribute that exists inside of an HTML attribute.  Such tags would be treated the same way as unregistered shortcodes; by displaying the raw code as if it were normal HTML.  Here are some examples of how this works:

==API 4.6==
Input:  <a title='[shortcode insert="fun"]'>
Output: <a title='My fun link is here.'>

==API 4.7==
Input:  <a title='[shortcode insert="fun"]'>
Output: <a title='[shortcode insert="fun"]'>

We expect this to be the most controversial change on the roadmap. There is very little that we can do to ease the pain of developing and converting to a better syntax in this situation, because the plugin was not providing the needed HTML. What we need to know is whether or not the plugins affected by this change can be ready and compatible by the 4.7 Milestone? Is there anything more we can do to help plugin authors with this change?

Lastly, I would like to thank the many community members who commented on the first draft or participated in the first meeting. This second draft was truly a group effort.

#meeting, #roadmaps, #shortcodes