This week, I began work on the next phase of my project: fixing up the WXR importer plugin. A number of developers, including Jon Cave, Peter Westwood, and Andrew Nacin have been maintaining this plugin since at least May 2010.
I forked the code from the plugins repository into my GSoC Subversion directory in preparation. It’s taken a while to test this (manually) against XML files from existing sites, so that I can see under what circumstances it fails to complete the import or perform to expectations, and what can be done. Trac tickets and forum posts have been informative as well. (See the linked posts for the results and observations.)
I’ve also run the unit tests that apply to the importer plugin; however, the test cases are generally small (indeed, the biggest XML test case is 26 KB, titled
small-export.xml) and don’t trigger the kinds of issues that importing dozens or hundreds of posts and attachments—with WXR files of megabytes in size—does.
So, the first task at hand is breaking up the process—which currently executes in one step with little indication of progress—into discrete chunks that can run in separate, stateful (stepwise) requests.
A chat with my mentors has pointed me in the direction of WP_Importer_Cron, which was first developed for other importers that need to make external API calls (e.g. Tumblr Importer) potentially subject to rate constraints. There are some parallels between “external API calls” and “remote attachment fetching”, which is why this can be a suitable approach for fixing the timeout issues that present with the current WordPress importer. After the process is discretized, showing progress (an enhancement long overdue) will be easier.