Migration update: brief hiatus

I’m glad to see the excitement that the recent 3.7/3.8 discussions are generating. I’ll only write a short update today on the importer.

Since I’ve been moving back to campus and am in the midst of a busy staff training period (as disclosed in my application and timeline), I haven’t been able to get much coding in, and won’t have the time to code new features until next week.

I’ve received some very helpful feedback that led to resolutions, and will continue fixing bugs in the little time I have this week:

(I periodically check old and open tickets on Core Trac for the importer/exporter to see what I can handle within the scope of this project.)

Meanwhile, I’m also examining the Relocate tool‘s handling of large sites — out-of-memory issues — which will eventually be resolved in the bigger picture by integrating it into the importer or attaching it to WP_Importer_Cron so that it, too, can work on a schedule.

Next Monday, a new beta of the cron-based importer will be tagged and released for download.

#migration-portability, #weekly-update

Migration update: try this importer

Hey everyone,

The importer is largely unchanged from last week, with the exception of a few UI changes:

  • #341: Progress for posts/attachments/menu items is now shown correctly (in %d of %d format)
  • #342: The debug view (showing the raw data) now uses print_r through a special chars filter
  • #340: UI now has full-sentence strings to communicate how the process works and when the import is done, and Refresh/Abort buttons are shown above and below the progress.

An import of a WordPress WXR file in progress

A completed WordPress import

I’ve also had the chance to run it against a large number of import files, including ones sent to me by generous volunteers who read some of my previous weekly updates (props @tieptoep). No catastrophes, yet!

Obviously, it’s still a work in progress, but I’m now willing to risk a public beta. The usual disclaimers (please don’t use this on a production site) apply.

Although I’m not aware of any other plugins that build on the WXR importer through its API, I nevertheless generated some PHPDoc API documentation using phpDocumentor 2, which might be handy if you decide to hook into this or reuse its components.

I’d love to hear your feedback on the interface, on the general experience using this importer, and any errors or warnings that you encounter. Thanks!

#importers, #migration-portability, #weekly-update

Migration update: cron importer part 2

Hey everybody — I have good news and bad news.

Good news: I’ve finished porting all the individual import steps to the cron model and now have a mostly working frontend UI (largely unchanged from the previous iteration of the importer) that utilizes it.

As of this evening, the cron model is able to parse, process, and finish importing two test XML files from the core unit tests (valid-wxr-1.1.xml and small-export.xml). The test case, which uses exactly the same assertions as the core unit test, passes all 193 assertions. (Update: an errorless import of the wptest.io sample data has been done.)

WordPress import in progress

WordPress cron import in progress

A completed cron import

A completed cron import

Bad news: I wanted to tag a version and release a download today, but I’ve decided not to do so due to the unconfirmed stability of the importer. As some astute observers noted last week, storing the temporary data in the options table can blow out caches. Although I’ve attempted to mitigate this (see [2180] and this reference from a few years back on Core Trac), I still need to test this against some real systems before I release it and break your server.

Those who are very interested can always check out the source in Subversion. I will post a comment under this post if a download is made available before my next weekly update.

Although an overhaul of the XML parser, as suggested in the comments on last week’s post, is probably necessary to avoid memory and caching issues, my first priority was to finish the migration of processes to the cron tasks. As soon as I can post a working importer, I will immediately turn my attention to the XML parsing step.

#importers, #migration-portability, #weekly-update

Migration update: cron importer

Following last week’s update about the WP_Importer_Cron approach to writing importers and running import jobs, I’ve been steadily transitioning code from the current non-stateful, single-execution plugin to a stateful, step-wise process (#327).

At the same time, I needed to separate presentation from logic/backend processing (#331) — something that @otto42 also recommended — in two ways:

  • Removing direct printf(), echo statements that were used by the WXR importer (example)
    and changing them to WP_Error objects (example of fatal error; of non-fatal warning)
  • Handling uploads and UI choices in a separate class

Why must this be done now? Well, asynchronous tasks differ from PHP scripts directly responding to a browser request — we can’t depend on having access to submitted $_POST data, nor can we directly pipe output to the user. This change would also make it easier to understand what the code is doing from reading it, and to test programmatically.

One dilemma I’ve encountered: how best to store the parsed import XML file. Since each step of the import (users, categories, plugins, etc) runs separately, we must…

  1. store all of the parsed data in variables, which are serialized into an option between runs
    (obviously, a huge amount of data for which this may not be the most robust or efficient method);
  2. re-parse the XML on each run
    (currently, parsers handle all parts of the XML at once, which means unnecessarily duplicated effort and time);
  3. modify the parsers to parse only part of the XML at a time; or
  4. split the XML file into chunks based on their contents (authors, categories, etc) and then feed only partial chunks to the parser at a time.

Any thoughts? Solving this problem could also help the plugin deal with large XML files that we used to need to break up by hand before importing. (The Tumblr importer doesn’t have the same problem because there is no massive amount of data being uploaded at the beginning.)

I haven’t yet finished transitioning all the steps; I’m afraid it won’t be possible to use this just yet. Before next Monday, I should have a downloadable plugin that’s safe to try.

#importers, #migration-portability, #weekly-update

Migration update: WXR importer

This week, I began work on the next phase of my project: fixing up the WXR importer plugin. A number of developers, including Jon Cave, Peter Westwood, and Andrew Nacin have been maintaining this plugin since at least May 2010.

I forked the code from the plugins repository into my GSoC Subversion directory in preparation. It’s taken a while to test this (manually) against XML files from existing sites, so that I can see under what circumstances it fails to complete the import or perform to expectations, and what can be done. Trac tickets and forum posts have been informative as well. (See the linked posts for the results and observations.)

I’ve also run the unit tests that apply to the importer plugin; however, the test cases are generally small (indeed, the biggest XML test case is 26 KB, titled small-export.xml) and don’t trigger the kinds of issues that importing dozens or hundreds of posts and attachments—with WXR files of megabytes in size—does.

So, the first task at hand is breaking up the process—which currently executes in one step with little indication of progress—into discrete chunks that can run in separate, stateful (stepwise) requests.

A chat with my mentors has pointed me in the direction of WP_Importer_Cron, which was first developed for other importers that need to make external API calls (e.g. Tumblr Importer) potentially subject to rate constraints. There are some parallels between “external API calls” and “remote attachment fetching”, which is why this can be a suitable approach for fixing the timeout issues that present with the current WordPress importer. After the process is discretized, showing progress (an enhancement long overdue) will be easier.

#migration-portability, #weekly-update

Migration update: WP Relocate

This past week, I’ve focused primarily on bringing a usable interface to the WP_Relocate class that I posted about last week. While the WP_Relocate class file itself was meant to be something that could be bundled into another suite and reused, the fruits of this week’s labour come in the form of an installable plugin.

In actuality, the plugin hooks are very shallow — merely used to add a menu link (for now). The most important part is the practically-standalone interface.

WP Relocate UI

The design intent was to make this look and feel just like the installer — or the upgrade.php script for database schema changes.

This has been tested against 3.5.2 and 3.6 only. Despite differences in how these versions handle revisions, the changes made to post content are still revertible in either version:

Revisions tracking search and replace

I can test this in code to my heart’s content (except for the difficulties using the current unit-tests framework against older versions of core) but nothing beats running it on live site data, especially sites that have posts and uploads from earlier versions of WordPress. As you might imagine, live data is more difficult to generate and test against. I don’t yet know how well this process works with hundreds or thousands of posts, in terms of the time it takes and the level of verbosity communicated to the site administrator.

If you’re interested in trying this out (please don’t use it on a production site — it’s probably not ready for that), feel free to:

It installs into wp-content/plugins. The included readme.html file contains more detailed instructions. If anyone tries it out, I’d love to hear if it broke your site. 🙂

What’s next? This week, I am forking the importer plugin that works with WXR files, to examine and fix its issues with fetching attachments from the source. I hope to add some ability to replace URLs using the WP_Relocate class to smooth out the process of copying content from another installation.

Edit (2013-07-09): updated to 1.0.1 with fix for PHP versions before 5.4.0.

#migration-portability, #weekly-update

Migration project update

I’ve spent the past few days building functionality for what I’ve termed the “relocate” component — the class that handles site-wide replacement of old URLs with new. A plugin UI is in the work.

Two discoveries were made in the process of coding the relocate component.

  1. Attachment locations have been stored as relative paths since 2.7, which helps with keeping them portable. However, I have seen full paths in the database tables of sites that started up pre-2.7; transitioning those to portable relative paths will be part of what I’m building.
  2. By using WordPress queries and post functions (as opposed to SQL queries) to replace within post content, we gain the advantage of storing those changes in revisions, rather than indiscriminately touching all entries in the posts table in the way that was previously documented on the Codex.

With the guidance of WP CLI’s explanation for how to set up a plugin for testing, I’ve also coded unit tests to verify that things are working as they should. One hurdle I faced in unit testing was that update_option('siteurl') doesn’t affect the value of the WP_CONTENT_URL constant until the next execution of the script, by which time the unit test framework will have rolled back the transaction. This minor quirk won’t translate to real world use, although I’m not sure if I’m satisfied with the way I overcame it.

Finishing the UI for this component is my timeline’s task for this week; it will facilitate a user’s decision to change the site URL to a different domain. I’ll make sure installation/usage instructions are shown by next week.

Happy Canada Day!

#migration-portability, #weekly-update

Weekly migration project update

Just a quick update today, as I’ve been a little sick and stayed offline for a few days.

Per my timeline, I worked on a light component last week that focuses on the URL replacement part of a migration operation (e.g. from http://www.example.com to https://wp.example.com/wordpress). Hints were taken from the way search-and-replace was implemented in WP-CLI. My goal was to make this usable in whatever UI implementation is built later. I also need to get a grasp on how unit tests are actually done with core, because dependencies on functions like get_option() make systematic testing more difficult without an accessible database and tables.

In terms of what to expand on with the importer, an important question that remains is just how much should be moved with the importer.

At the moment, despite its deficiencies, its scope can be simply delineated: it moves content. Just content. Not plugins, not site options, not themes. If the importer were expanded to use WordPress’s XML-RPC API to transfer more — to copy over site options, to install plugins based on what was previously active — then it’s much harder to draw the line at what the importer will and will not do. I think this is an important decision question that would benefit from some community input.

This week, I’ll be coding more actively. To be done: a usable UI that can be used to test the work done thus far, and a fork of the existing importer plugin in which I will make my improvements.

#migration-portability, #weekly-update

Migration project for GSoC

Hello world! My name is Frederick, one of the GSoC interns who will be contributing to WordPress migration features this summer.

A proud Canadian who grew up in Toronto and its suburbs, I am currently a bioengineering undergraduate at the University of Pennsylvania in Philadelphia, with hopes of working in the clinical and public health roles of a physician. The connection to coding might seem tenuous, but I am a firm believer in pursuing passions, despite how incongruous they may seem. As I wrote in my application, WordPress has offered me much in the way of community and inspiration, and I hope to gain better insight into my own aspirations through this internship.

Like many in the community, my involvement with WordPress has included some plugins, and sites developed for work and student organizations. Although I’ve worked on two separate open source PHP projects, this is the first opportunity I’ve had to contribute to something that can reach so many people; indeed, the past Ten Good Years have yielded not only a collection of lines of code, but a huge and intensely active ecosystem of developers, designers, and users. To have the chance, even for 3 short months, to be a part of it, is both exhilarating and terrifying at the same time!

My project is to improve the migration experience and the portability of WordPress. Just the thought of moving WordPress elicits headaches because of all the things that can go wrong, as one stunningly recent discussion in the community reminded me.

For this project, I’ll be treading across both familiar and foreign territory. By current plans, I’d like to bring domain/URL renames to the backend and WP CLI, improve media handling and progress feedback in the WordPress-to-WordPress importer, and build in some semblance of plugin & option migration to the export/import workflow. (Subject to further change with notice.) More details will come in the days ahead.

I’m really thrilled to be working with all of you! In addition to my weekly updates here, my notes-to-self and handy links to Trac/source can be found on my project site. I’d love to hear your feedback here and throughout the project.

#migration-portability, #weekly-update