Migration update: brief hiatus

I’m glad to see the excitement that the recent 3.7/3.8 discussions are generating. I’ll only write a short update today on the importer.

Since I’ve been moving back to campus and am in the midst of a busy staff training period (as disclosed in my application and timeline), I haven’t been able to get much coding in, and won’t have the time to code new features until next week.

I’ve received some very helpful feedback that led to resolutions, and will continue fixing bugs in the little time I have this week:

  • Undefined index notices (#327 comment:22)
  • Remove UIUI User interface reference to ‘adminadmin (and super admin)’ user (#350, core #24729)
  • FilterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. to exclude menu items from import (#349)

(I periodically check old and open tickets on CoreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. TracTrac An open source project by Edgewall Software that serves as a bug tracker and project management tool for WordPress. for the importer/exporter to see what I can handle within the scope of this project.)

Meanwhile, I’m also examining the Relocate tool‘s handling of large sites — out-of-memory issues — which will eventually be resolved in the bigger picture by integrating it into the importer or attaching it to WP_Importer_Cron so that it, too, can work on a schedule.

Next Monday, a new betaBeta A pre-release of software that is given out to a large group of users to trial under real conditions. Beta versions have gone through alpha testing in-house and are generally fairly close in look, feel and function to the final product; however, design changes often occur as part of the process. of the cron-based importer will be tagged and released for download.

#migration-portability, #weekly-update

Migration update: try this importer

Hey everyone,

The importer is largely unchanged from last week, with the exception of a few UIUI User interface changes:

  • #341: Progress for posts/attachments/menu items is now shown correctly (in %d of %d format)
  • #342: The debug view (showing the raw data) now uses print_r through a special chars filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output.
  • #340: UI now has full-sentence strings to communicate how the process works and when the import is done, and Refresh/Abort buttons are shown above and below the progress.

An import of a WordPress WXR file in progress

A completed WordPress import

I’ve also had the chance to run it against a large number of import files, including ones sent to me by generous volunteers who read some of my previous weekly updates (props @tieptoep). No catastrophes, yet!

Obviously, it’s still a work in progress, but I’m now willing to risk a public betaBeta A pre-release of software that is given out to a large group of users to trial under real conditions. Beta versions have gone through alpha testing in-house and are generally fairly close in look, feel and function to the final product; however, design changes often occur as part of the process.. The usual disclaimers (please don’t use this on a production siteProduction Site A production site is a live site online meant to be viewed by your visitors, as opposed to a site that is staged for development or testing.) apply.

Although I’m not aware of any other plugins that build on the WXR importer through its APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways., I nevertheless generated some PHPDoc API documentation using phpDocumentor 2, which might be handy if you decide to hook into this or reuse its components.

I’d love to hear your feedback on the interface, on the general experience using this importer, and any errors or warnings that you encounter. Thanks!

#importers, #migration-portability, #weekly-update

Migration update: cron importer part 2

Hey everybody — I have good news and bad news.

Good news: I’ve finished porting all the individual import steps to the cron model and now have a mostly working frontend UIUI User interface (largely unchanged from the previous iteration of the importer) that utilizes it.

As of this evening, the cron model is able to parse, process, and finish importing two test XML files from the coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. unit tests (valid-wxr-1.1.xml and small-export.xml). The test case, which uses exactly the same assertions as the core unit testunit test Code written to test a small piece of code or functionality within a larger application. Everything from themes to WordPress core have a series of unit tests. Also see regression., passes all 193 assertions. (Update: an errorless import of the wptest.io sample data has been done.)

WordPress import in progress

WordPress cron import in progress

A completed cron import

A completed cron import

Bad news: I wanted to tagtag A directory in Subversion. WordPress uses tags to store a single snapshot of a version (3.6, 3.6.1, etc.), the common convention of tags in version control systems. (Not to be confused with post tags.) a version and release a download today, but I’ve decided not to do so due to the unconfirmed stability of the importer. As some astute observers noted last week, storing the temporary data in the options table can blow out caches. Although I’ve attempted to mitigate this (see [2180] and this reference from a few years back on Core Trac), I still need to test this against some real systems before I release it and break your server.

Those who are very interested can always check out the source in Subversion. I will post a comment under this post if a download is made available before my next weekly update.

Although an overhaul of the XML parser, as suggested in the comments on last week’s post, is probably necessary to avoid memory and caching issues, my first priority was to finish the migrationMigration Moving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies. of processes to the cron tasks. As soon as I can post a working importer, I will immediately turn my attention to the XML parsing step.

#importers, #migration-portability, #weekly-update

Migration update: cron importer

Following last week’s update about the WP_Importer_Cron approach to writing importers and running import jobs, I’ve been steadily transitioning code from the current non-stateful, single-execution pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party to a stateful, step-wise process (#327).

At the same time, I needed to separate presentation from logic/backend processing (#331) — something that @otto42 also recommended — in two ways:

  • Removing direct printf(), echo statements that were used by the WXR importer (example)
    and changing them to WP_Error objects (example of fatal error; of non-fatal warning)
  • Handling uploads and UIUI User interface choices in a separate class

Why must this be done now? Well, asynchronous tasks differ from PHPPHP The web scripting language in which WordPress is primarily architected. WordPress requires PHP 5.6.20 or higher scripts directly responding to a browser request — we can’t depend on having access to submitted $_POST data, nor can we directly pipe output to the user. This change would also make it easier to understand what the code is doing from reading it, and to test programmatically.

One dilemma I’ve encountered: how best to store the parsed import XML file. Since each step of the import (users, categories, plugins, etc) runs separately, we must…

  1. store all of the parsed data in variables, which are serialized into an option between runs
    (obviously, a huge amount of data for which this may not be the most robust or efficient method);
  2. re-parse the XML on each run
    (currently, parsers handle all parts of the XML at once, which means unnecessarily duplicated effort and time);
  3. modify the parsers to parse only part of the XML at a time; or
  4. split the XML file into chunks based on their contents (authors, categories, etc) and then feed only partial chunks to the parser at a time.

Any thoughts? Solving this problem could also help the plugin deal with large XML files that we used to need to break up by hand before importing. (The Tumblr importer doesn’t have the same problem because there is no massive amount of data being uploaded at the beginning.)

I haven’t yet finished transitioning all the steps; I’m afraid it won’t be possible to use this just yet. Before next Monday, I should have a downloadable plugin that’s safe to try.

#importers, #migration-portability, #weekly-update

Migration update: WXR importer

This week, I began work on the next phase of my project: fixing up the WXR importer plugin. A number of developers, including Jon Cave, Peter Westwood, and Andrew Nacin have been maintaining this pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party since at least May 2010.

I forked the code from the plugins repository into my GSoC Subversion directory in preparation. It’s taken a while to test this (manually) against XML files from existing sites, so that I can see under what circumstances it fails to complete the import or perform to expectations, and what can be done. Trac tickets and forum posts have been informative as well. (See the linked posts for the results and observations.)

I’ve also run the unit tests that apply to the importer plugin; however, the test cases are generally small (indeed, the biggest XML test case is 26 KB, titled small-export.xml) and don’t trigger the kinds of issues that importing dozens or hundreds of posts and attachments—with WXR files of megabytes in size—does.

So, the first task at hand is breaking up the process—which currently executes in one step with little indication of progress—into discrete chunks that can run in separate, stateful (stepwise) requests.

A chat with my mentors has pointed me in the direction of WP_Importer_Cron, which was first developed for other importers that need to make external APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. calls (e.g. Tumblr Importer) potentially subject to rate constraints. There are some parallels between “external API calls” and “remote attachment fetching”, which is why this can be a suitable approach for fixing the timeout issues that present with the current WordPress importer. After the process is discretized, showing progress (an enhancement long overdue) will be easier.

#migration-portability, #weekly-update

Migration update: WP Relocate

This past week, I’ve focused primarily on bringing a usable interface to the WP_Relocate class that I posted about last week. While the WP_Relocate class file itself was meant to be something that could be bundled into another suite and reused, the fruits of this week’s labour come in the form of an installable pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party.

In actuality, the plugin hooksHooks In WordPress theme and development, hooks are functions that can be applied to an action or a Filter in WordPress. Actions are functions performed when a certain event occurs in WordPress. Filters allow you to modify certain functions. Arguments used to hook both filters and actions look the same. are very shallow — merely used to add a menu link (for now). The most important part is the practically-standalone interface.

WP Relocate UI

The design intent was to make this look and feel just like the installer — or the upgrade.php script for database schema changes.

This has been tested against 3.5.2 and 3.6 only. Despite differences in how these versions handle revisions, the changes made to post content are still revertible in either version:

Revisions tracking search and replace

I can test this in code to my heart’s content (except for the difficulties using the current unit-tests framework against older versions of coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress.) but nothing beats running it on live site data, especially sites that have posts and uploads from earlier versions of WordPress. As you might imagine, live data is more difficult to generate and test against. I don’t yet know how well this process works with hundreds or thousands of posts, in terms of the time it takes and the level of verbosity communicated to the site administrator.

If you’re interested in trying this out (please don’t use it on a production siteProduction Site A production site is a live site online meant to be viewed by your visitors, as opposed to a site that is staged for development or testing. — it’s probably not ready for that), feel free to:

It installs into wp-content/plugins. The included readme.html file contains more detailed instructions. If anyone tries it out, I’d love to hear if it broke your site. 🙂

What’s next? This week, I am forking the importer plugin that works with WXR files, to examine and fix its issues with fetching attachments from the source. I hope to add some ability to replace URLs using the WP_Relocate class to smooth out the process of copying content from another installation.

Edit (2013-07-09): updated to 1.0.1 with fix for PHPPHP The web scripting language in which WordPress is primarily architected. WordPress requires PHP 5.6.20 or higher versions before 5.4.0.

#migration-portability, #weekly-update

Migration project update

I’ve spent the past few days building functionality for what I’ve termed the “relocate” component — the class that handles site-wide replacement of old URLs with new. A pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party UIUI User interface is in the work.

Two discoveries were made in the process of coding the relocate component.

  1. Attachment locations have been stored as relative paths since 2.7, which helps with keeping them portable. However, I have seen full paths in the database tables of sites that started up pre-2.7; transitioning those to portable relative paths will be part of what I’m building.
  2. By using WordPress queries and post functions (as opposed to SQL queries) to replace within post content, we gain the advantage of storing those changes in revisionsRevisions The WordPress revisions system stores a record of each saved draft or published update. The revision system allows you to see what changes were made in each revision by dragging a slider (or using the Next/Previous buttons). The display indicates what has changed in each revision., rather than indiscriminately touching all entries in the posts table in the way that was previously documented on the Codex.

With the guidance of WP CLI’s explanation for how to set up a plugin for testing, I’ve also coded unit tests to verify that things are working as they should. One hurdle I faced in unit testing was that update_option('siteurl') doesn’t affect the value of the WP_CONTENT_URL constant until the next execution of the script, by which time the unit testunit test Code written to test a small piece of code or functionality within a larger application. Everything from themes to WordPress core have a series of unit tests. Also see regression. framework will have rolled back the transaction. This minor quirk won’t translate to real world use, although I’m not sure if I’m satisfied with the way I overcame it.

Finishing the UI for this component is my timeline’s task for this week; it will facilitate a user’s decision to change the site URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org to a different domain. I’ll make sure installation/usage instructions are shown by next week.

Happy Canada Day!

#migration-portability, #weekly-update

Weekly migration project update

Just a quick update today, as I’ve been a little sick and stayed offline for a few days.

Per my timeline, I worked on a light component last week that focuses on the URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org replacement part of a migrationMigration Moving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies. operation (e.g. from http://www.example.com to https://wp.example.com/wordpress). Hints were taken from the way search-and-replace was implemented in WP-CLIWP-CLI WP-CLI is the Command Line Interface for WordPress, used to do administrative and development tasks in a programmatic way. The project page is http://wp-cli.org/ https://make.wordpress.org/cli/. My goal was to make this usable in whatever UIUI User interface implementation is built later. I also need to get a grasp on how unit tests are actually done with coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress., because dependencies on functions like get_option() make systematic testing more difficult without an accessible database and tables.

In terms of what to expand on with the importer, an important question that remains is just how much should be moved with the importer.

At the moment, despite its deficiencies, its scope can be simply delineated: it moves content. Just content. Not plugins, not site options, not themes. If the importer were expanded to use WordPress’s XML-RPC APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. to transfer more — to copy over site options, to install plugins based on what was previously active — then it’s much harder to draw the line at what the importer will and will not do. I think this is an important decision question that would benefit from some community input.

This week, I’ll be coding more actively. To be done: a usable UI that can be used to test the work done thus far, and a fork of the existing importer pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party in which I will make my improvements.

#migration-portability, #weekly-update

Migration project for GSoC

Hello world! My name is Frederick, one of the GSoC interns who will be contributing to WordPress migrationMigration Moving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies. features this summer.

A proud Canadian who grew up in Toronto and its suburbs, I am currently a bioengineering undergraduate at the University of Pennsylvania in Philadelphia, with hopes of working in the clinical and public health roles of a physician. The connection to coding might seem tenuous, but I am a firm believer in pursuing passions, despite how incongruous they may seem. As I wrote in my application, WordPress has offered me much in the way of community and inspiration, and I hope to gain better insight into my own aspirations through this internship.

Like many in the community, my involvement with WordPress has included some plugins, and sites developed for work and student organizations. Although I’ve worked on two separate open sourceOpen Source Open Source denotes software for which the original source code is made freely available and may be redistributed and modified. Open Source **must be** delivered via a licensing model, see GPL. PHPPHP The web scripting language in which WordPress is primarily architected. WordPress requires PHP 5.6.20 or higher projects, this is the first opportunity I’ve had to contribute to something that can reach so many people; indeed, the past Ten Good Years have yielded not only a collection of lines of codeLines of Code Lines of code. This is sometimes used as a poor metric for developer productivity, but can also have other uses., but a huge and intensely active ecosystem of developers, designers, and users. To have the chance, even for 3 short months, to be a part of it, is both exhilarating and terrifying at the same time!

My project is to improve the migration experience and the portability of WordPress. Just the thought of moving WordPress elicits headaches because of all the things that can go wrong, as one stunningly recent discussion in the community reminded me.

For this project, I’ll be treading across both familiar and foreign territory. By current plans, I’d like to bring domain/URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org renames to the backend and WP CLICLI Command Line Interface. Terminal (Bash) in Mac, Command Prompt in Windows, or WP-CLI for WordPress., improve media handling and progress feedback in the WordPress-to-WordPress importer, and build in some semblance of pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party & option migration to the export/import workflow. (Subject to further change with notice.) More details will come in the days ahead.

I’m really thrilled to be working with all of you! In addition to my weekly updates here, my notes-to-self and handy links to TracTrac An open source project by Edgewall Software that serves as a bug tracker and project management tool for WordPress./source can be found on my project site. I’d love to hear your feedback here and throughout the project.

#migration-portability, #weekly-update