Hi, I’m Ryan McCue. You may remember me from such projects as the REST API.
I’m here today to talk about something a bit different: the WordPress Importer. The WordPress Importer is key to a tonne of different workflows, and is one of the most used plugins on the repo.
Unfortunately, the Importer is also a bit unloved. After getting immensely frustrated at the Importer, I figured it was probably time we throw some attention at it. I’ve been working on fixing this with a new and improved Importer!
If you’re interested in checking out this new version of the Importer, grab it from GitHub. It’s still some way from release, but the vast majority of functionality is already in place. The plan is to eventually replace the existing Importer with this new version.
The key to these Importer improvements is rewriting the core processing, taking experience with the current Importer and building to fix those specific problems. This means fixing and improving a whole raft of problems:
- Way less memory usage: Testing shows memory usage to import a 41MB WXR file is down from 132MB to 19MB (less than half the actual file size!). This means no more splitting files just to get them to import!
- Faster parser: By using a streaming XML parser, we process data as we go, which is much more scalable than the current approach. Content can begin being imported as soon as the file is read, rather than waiting for pre-processing.
- Resumable parsing: By storing more in the database instead of variables, we can quit and resume imports on-the-go.
- Partial imports: Rethinking the deduplication approach allows better partial imports, such as when you’re updating a production site from staging.
- Better CLI: Treating the CLI as a first-class citizen means a better experience for those doing imports on a daily basis, and better code quality and reusability.
Curious as to how all of this is done? Read on!