WordPress.org

Ready to get started?Download WordPress

Make WordPress Core

Updates from Frederick Ding Toggle Comment Threads | Keyboard Shortcuts

  • Frederick Ding 12:00 pm on August 12, 2013 Permalink
    Tags: ,   

    Migration update: brief hiatus 

    I’m glad to see the excitement that the recent 3.7/3.8 discussions are generating. I’ll only write a short update today on the importer.

    Since I’ve been moving back to campus and am in the midst of a busy staff training period (as disclosed in my application and timeline), I haven’t been able to get much coding in, and won’t have the time to code new features until next week.

    I’ve received some very helpful feedback that led to resolutions, and will continue fixing bugs in the little time I have this week:

    (I periodically check old and open tickets on Core Trac for the importer/exporter to see what I can handle within the scope of this project.)

    Meanwhile, I’m also examining the Relocate tool‘s handling of large sites — out-of-memory issues — which will eventually be resolved in the bigger picture by integrating it into the importer or attaching it to WP_Importer_Cron so that it, too, can work on a schedule.

    Next Monday, a new beta of the cron-based importer will be tagged and released for download.

     
  • Frederick Ding 5:16 am on August 6, 2013 Permalink
    Tags: , ,   

    Migration update: try this importer 

    Hey everyone,

    The importer is largely unchanged from last week, with the exception of a few UI changes:

    • #341: Progress for posts/attachments/menu items is now shown correctly (in %d of %d format)
    • #342: The debug view (showing the raw data) now uses print_r through a special chars filter
    • #340: UI now has full-sentence strings to communicate how the process works and when the import is done, and Refresh/Abort buttons are shown above and below the progress.

    An import of a WordPress WXR file in progress

    A completed WordPress import

    I’ve also had the chance to run it against a large number of import files, including ones sent to me by generous volunteers who read some of my previous weekly updates (props @tieptoep). No catastrophes, yet!

    Obviously, it’s still a work in progress, but I’m now willing to risk a public beta. The usual disclaimers (please don’t use this on a production site) apply.

    Although I’m not aware of any other plugins that build on the WXR importer through its API, I nevertheless generated some PHPDoc API documentation using phpDocumentor 2, which might be handy if you decide to hook into this or reuse its components.

    I’d love to hear your feedback on the interface, on the general experience using this importer, and any errors or warnings that you encounter. Thanks!

     
    • thisislawatts 9:45 am on August 7, 2013 Permalink | Log in to Reply

      Really excited about this plugin, it’s development seems to coincide with my needing to transfer 6 WordPress blogs(~600 posts each) into 2. So you can only imagine my joy!

      I have downloaded the beta above I don’t seem to get anything beyond the initial ‘Import Status’, which shows how many posts it’s got to import, then clicking ‘Refresh’ I get -> http://thisis.la/pix/of-course.png. I am not sure what it’s doing here. I am running this on XAMPP so could it be an issue with the Cron not running? If you could point me towards a useful debugging point I will tell you every

      Then a second though on the UI, these blogs that I am importing have a stack of users, which typically result in -> http://thisis.la/pix/wordpress-import-assign.png. And that is super boring to fill in, so I created a quick JS snippet to go through and autofill those values. https://gist.github.com/thisislawatts/6163831. Thinking it would be great if those usernames wrapped in ()’s were also wrapped in a span to spare my messy regex.

      Thanks

      • Frederick Ding 7:13 pm on August 7, 2013 Permalink | Log in to Reply

        Thanks for trying it out! I expected there to be some problems, so let’s see if we can figure out what’s going on :)

        Since you mentioned using XAMPP, I imagine it to be this scenario: On local installations (especially with hostnames that do not resolve in DNS), cron theoretically falls back to “alternative cron” mode. I haven’t yet tested with alternative cron; most likely it’s redirecting the admin user right after clicking on “Refresh”, and that fails to pass on the nonce. I think this is what’s happening. Let me try it out on a local install and see what happens.

        The JavaScript for prefilling users is a good idea, although it can also be done server-side. (It currently uses wp_dropdown_users(), which can pre-select a user.) Aside from this, I’d be fine with including the usernames in a data-* attribute; it’d make this a lot easier.

        Edit: I’ve added a new Trac ticket for this enhancement in GSoC #346; it’s related to previous requests that were punted (#8455, #16148).

    • TCBarrett 2:21 pm on August 14, 2013 Permalink | Log in to Reply

      This looks great. One of the problems with exporting is that you cannot export a single post type with attachments, it has to be all or nothing. Does this solve that problem? Or is there an equivalent project?

  • Frederick Ding 3:33 am on July 30, 2013 Permalink
    Tags: , ,   

    Migration update: cron importer part 2 

    Hey everybody — I have good news and bad news.

    Good news: I’ve finished porting all the individual import steps to the cron model and now have a mostly working frontend UI (largely unchanged from the previous iteration of the importer) that utilizes it.

    As of this evening, the cron model is able to parse, process, and finish importing two test XML files from the core unit tests (valid-wxr-1.1.xml and small-export.xml). The test case, which uses exactly the same assertions as the core unit test, passes all 193 assertions. (Update: an errorless import of the wptest.io sample data has been done.)

    WordPress import in progress

    WordPress cron import in progress

    A completed cron import

    A completed cron import

    Bad news: I wanted to tag a version and release a download today, but I’ve decided not to do so due to the unconfirmed stability of the importer. As some astute observers noted last week, storing the temporary data in the options table can blow out caches. Although I’ve attempted to mitigate this (see [2180] and this reference from a few years back on Core Trac), I still need to test this against some real systems before I release it and break your server.

    Those who are very interested can always check out the source in Subversion. I will post a comment under this post if a download is made available before my next weekly update.

    Although an overhaul of the XML parser, as suggested in the comments on last week’s post, is probably necessary to avoid memory and caching issues, my first priority was to finish the migration of processes to the cron tasks. As soon as I can post a working importer, I will immediately turn my attention to the XML parsing step.

     
  • Frederick Ding 12:03 am on July 23, 2013 Permalink
    Tags: , ,   

    Migration update: cron importer 

    Following last week’s update about the WP_Importer_Cron approach to writing importers and running import jobs, I’ve been steadily transitioning code from the current non-stateful, single-execution plugin to a stateful, step-wise process (#327).

    At the same time, I needed to separate presentation from logic/backend processing (#331) — something that @otto42 also recommended — in two ways:

    • Removing direct printf(), echo statements that were used by the WXR importer (example)
      and changing them to WP_Error objects (example of fatal error; of non-fatal warning)
    • Handling uploads and UI choices in a separate class

    Why must this be done now? Well, asynchronous tasks differ from PHP scripts directly responding to a browser request — we can’t depend on having access to submitted $_POST data, nor can we directly pipe output to the user. This change would also make it easier to understand what the code is doing from reading it, and to test programmatically.

    One dilemma I’ve encountered: how best to store the parsed import XML file. Since each step of the import (users, categories, plugins, etc) runs separately, we must…

    1. store all of the parsed data in variables, which are serialized into an option between runs
      (obviously, a huge amount of data for which this may not be the most robust or efficient method);
    2. re-parse the XML on each run
      (currently, parsers handle all parts of the XML at once, which means unnecessarily duplicated effort and time);
    3. modify the parsers to parse only part of the XML at a time; or
    4. split the XML file into chunks based on their contents (authors, categories, etc) and then feed only partial chunks to the parser at a time.

    Any thoughts? Solving this problem could also help the plugin deal with large XML files that we used to need to break up by hand before importing. (The Tumblr importer doesn’t have the same problem because there is no massive amount of data being uploaded at the beginning.)

    I haven’t yet finished transitioning all the steps; I’m afraid it won’t be possible to use this just yet. Before next Monday, I should have a downloadable plugin that’s safe to try.

     
    • dllh 1:34 am on July 23, 2013 Permalink | Log in to Reply

      You’ll want to be careful about trying to store data in options. For example, in an environment that’s using memcached, which has a size limit, you can blow up sites by trying to store too much data in options, so it’s not necessarily a matter only of efficiency.

      Also, if you use options, I imagine you also have to use something like an extra tracking option to know which parts of the import you’ve handled. This is just begging for race conditions.

      I wonder if there’s anything you can do using FormData to split the file into chunks client-side and assemble them server-side. I’m not sure what browser support is like, and I worry it’d be pretty brittle. Just thinking aloud.

      • Frederick Ding 1:44 am on July 23, 2013 Permalink | Log in to Reply

        You’re completely right about the caching implications — I’ve all but eliminated possibility #1.

        I was thinking about the possibilities of client-side file splitting, too, even though it’s not the most robust or reliable way. Last week, I looked briefly at https://github.com/blueimp/jQuery-File-Upload which supports chunking/multipart — but that’s not quite what we’d need, is it? The browser would need to operate along XML node boundaries, not just file size. (I’d be intrigued if it’s possible to utilize a browser’s native DOM engine to do that…)

    • Ryan McCue 1:50 am on July 23, 2013 Permalink | Log in to Reply

      So, with regards to XML parsing options: splitting the XML file is something that absolutely should not be done. You can do it, but only if you use proper XML serialization/deserialization, which is likely going to take up a chunk of time.

      Reparsing is a bit of a pain too, since the XML parsing usually takes up the largest amount of time there. The best bet with regards to memory is to use a SAX-style parser which streams the XML, whereas a DOM-style parser will read it all at once. SimpleXML is a DOM-style parser, so you should avoid that for performance, whereas xml_parse is SAX-based.

      I’m biased as the lead developer of it, but you could use SimplePie, which a) is built into core, b) has built in caching (via the transients API, but you’d probably want file-based caching here) and c) natively parses RSS (given that’s its job), which WXR is based on. This handles picking a parser all internally, although doesn’t use a stream parser due to internal implementation details, so you may want to stick away from it for that. I’m relatively certain I could write a parser (in `WXR_Parser_*` form) in a few hours, so it should be easy to do.

      At the least, I’d take a look into how SimplePie handles caching. Basically, it’s a giant serialized array, usually saved to a file (although in WP, this uses transients).

      (If you do want to take the full SimplePie route, I’d love to help, most likely in a consulting role. It’d simplify the `WXR_Parser` classes significantly by moving the XML parsing out of those.)

      I’ve spent a fair bit of time messing with XML parsers, so feel free to pick my brain on this! :)

      • Ryan McCue 1:55 am on July 23, 2013 Permalink | Log in to Reply

        (What you really want here is a resumable SAX/pull parser, with some way to persist state across processes. With `xml_parse`, you should be able to feed in data as you stream it in, but you’ll still need to keep the data in memory since I don’t know of any streamable serialization in PHP. Please note that if you do use your own code here, there are security considerations that will need to be discussed. Your mentors should be able to help with that.)

      • Frederick Ding 2:51 am on July 23, 2013 Permalink | Log in to Reply

        I should have remembered that you’re an expert on XML!

        I’m not familiar with the differences between SAX/pull/DOM, but if I’m reading this right, then pull — with a way to persist state — would be the most appropriate model to follow. (In PHP, XMLReader appears to be a pull parser that’s enabled by default since PHP 5.1.2 — so I’d just need to write a WXR_Parser_* to take advantage of it.)

        Thanks for pointing me in this direction!

        Edit: Actually, scratch my eagerness to use XMLReader. It appears to be weakly documented and most of what I can see involves using SimpleXML/DOM on nodes…

        • Ryan McCue 6:43 am on July 23, 2013 Permalink | Log in to Reply

          XMLReader or SAX are the ones I’d go for. There’s an IBM DeveloperWorks article that might help with that.

          The way I’d handle it is to keep state of where you currently are in the stack. WXR_Parser_XML uses this concept to handle it, but uses the SAX parser instead. Conceptually, the way it keeps track of the position is the way you’d want to do it (although there’s a bunch of cases it doesn’t handle).

          WXR_Parser_XML isn’t the best though, since all this data is loaded into memory and then pushed to the database later. Although it means you have cross dependencies, I’d consider inserting posts/etc as you go, rather than bunching them all up. This is a pretty fundamental rework of the internal parsing API, but it’s one that you’ll need for this sort of thing.

          Personally, I’d create two objects (a parser and a data handler) and use dependency injection to ensure that you still have a weak coupling between the two.

          (As a note, SimplePie also uses the SAX parser, but loads all the data into memory because it has to. This is a case where it’s a much better idea to use XMLReader directly.)

          Regarding XMLReader documentation, are there any specific examples? I’m happy to assist with specifics here, ping me at my email (see footer) if you’d like.

        • Ryan McCue 6:51 am on July 23, 2013 Permalink | Log in to Reply

          Also, the reason people are using XMLReader with SimpleXML/DOM is because they prefer the SimpleXML/DOM API, and the performance using the hybrid is better than straight SimpleXML. Personally, I’d stick with the one rather than switching, because there are some performance concerns with that.

  • Frederick Ding 12:00 am on July 16, 2013 Permalink
    Tags: ,   

    Migration update: WXR importer 

    This week, I began work on the next phase of my project: fixing up the WXR importer plugin. A number of developers, including Jon Cave, Peter Westwood, and Andrew Nacin have been maintaining this plugin since at least May 2010.

    I forked the code from the plugins repository into my GSoC Subversion directory in preparation. It’s taken a while to test this (manually) against XML files from existing sites, so that I can see under what circumstances it fails to complete the import or perform to expectations, and what can be done. Trac tickets and forum posts have been informative as well. (See the linked posts for the results and observations.)

    I’ve also run the unit tests that apply to the importer plugin; however, the test cases are generally small (indeed, the biggest XML test case is 26 KB, titled small-export.xml) and don’t trigger the kinds of issues that importing dozens or hundreds of posts and attachments—with WXR files of megabytes in size—does.

    So, the first task at hand is breaking up the process—which currently executes in one step with little indication of progress—into discrete chunks that can run in separate, stateful (stepwise) requests.

    A chat with my mentors has pointed me in the direction of WP_Importer_Cron, which was first developed for other importers that need to make external API calls (e.g. Tumblr Importer) potentially subject to rate constraints. There are some parallels between “external API calls” and “remote attachment fetching”, which is why this can be a suitable approach for fixing the timeout issues that present with the current WordPress importer. After the process is discretized, showing progress (an enhancement long overdue) will be easier.

     
    • dllh 1:07 pm on July 16, 2013 Permalink | Log in to Reply

      I’ve worked quite a lot with imports and am glad to see you doing some work on this front. One of the problems I think you’ll wind up running into with attachment handling is backfilling. You have to maintain a list of attachment ids and their parents, and the ids can change as the import process goes on and would-be id collisions are averted. Further, there are sometimes images that are referenced in posts but not attached to them. So a backfill process that takes these things into account is important.

      So farming fetching itself out to cron isn’t going to solve the problem, though it may help with timeouts. Some issues I foresee:

      • On sites with little traffic, the images will basically never import, or people will perpetually file bug reports that it’s not finishing, when in fact, they just need a few hundred visits to cycle through fetching all their images.
      • There’s no built-in way to track status of a given image. If you try to use something like options to track status, you’ll wind up with too-big options (potentially bad for sites using memcached) or find that race conditions prevent proper saving of the data.
      • What happens if images start to be processed before the import itself is done? This becomes an issue when an attachment that appears near the top of the export is fetched before a post near the bottom of the export that references it is fetched. A backfill process that’s timed with the image fetch won’t yet have that late post to check for a reference to fix up.
      • What happens if the import is halted partway through? This wreaks havoc with backfilling in particular.

      It occurs to me (I’m shooting from the hip here, so it’s not terribly well thought-out and may be stupid) that you could possibly register a custom post type to get around some of these issues. The process would look something like this:

      1. When the import starts, a unique taxonomy term in a taxonomy registered to the new CPT is created.
      2. Every time the importer encounters an image, it creates a new post of type CPT containing relevant data and with the unique taxonomy term applied. No fetching is done yet. (Or, fetching is done in the background and a post meta is toggled to represent the state of the image progress. So on initial creation, it’s “pending,” when the fetch is in progress, it’s “downloading,” then it becomes “pending_backfill,” then “backfilling,” then “finished” (and “failed” when applicable, which could possibly be useful in diagnosing and fixing issues with the import after the fact — as the CPT data will include post parent, a nice little list of problematic posts could be generated so that the user could fix things up rather than just hoping for the best).
      3. When the importer is finished, it queues a cron job to check for posts of type CPT in the given taxonomy with the unique term.
      4. That job fetches N CPTs at a time and fetches/backfills them (after changing their status in meta so that future jobs don’t scoop them up). If there are any left, it fires another job to get the next batch. Maybe there’s an option stored to prevent doubling up on jobs (or to allow multiple jobs without collision).
      5. When there are no more CPTs with the unique taxonomy term, the import is finished, and the status screen can be updated and an email sent.
      6. If fetch/backfill jobs seem to be languishing, we can report it on the status screen, and perhaps, if they seem to have been stalled for a while, we can have a button you can push to manually fire the job off and kickstart the process.

      I think something along these lines would allow for better display of data about the status of the import as well as less brittle handling of attachments. It won’t necessarily be the fastest process in the world, and the implementation of a CPT for these purposes feels a little weird to me, but otherwise, I’m not sure there’s a great way to track the status of attachments without running into issues with either option sizes or race conditions (trust me — I’ve dealt with this).

      For what it’s worth, I’m not a fan of javascript-driven approaches. JS is fine for polling the status and updating the UI, but as the thing that triggers and tends the process, it seems weak. What if you close the page and the js hasn’t managed to store proper state remotely, for example? Better to have something that runs properly whether you’ve got the import page loaded or not.

      Well, that was a mouthful. Even if my napkin proposal is a crackpot idea, maybe it’ll spark some useful thoughts. Good luck with it. :)

      • Samuel Wood (Otto) 10:48 pm on July 16, 2013 Permalink | Log in to Reply

        WP_Importer_Cron handles a fair amount of the cron work by making the entire import happen as a cron process. Essentially, I first wrote it not because of rate limits, but because a) you run into the time limit on PHP processes when you have a lot of work to do and b) Tumblr is slow as heck in responding to API requests.

        The way you use it is to make your importer class extend WP_Importer_Cron. It gives you some functions to help with this sort of thing.

        • First is save_vars(), which essentially saves your class’s variables to the options table for usage on the later instantiations. Thus, you can save your progress as class variables, then just call save_vars after each step (like after you import a post, for example). This keeps track of your progress. On later runs, your vars are automatically loaded into the class instance for you, putting you back where you were.
        • Next is have_time(), which simply says whether you’re out of time or not. If it returns false, then you’ve exceeded 30 seconds (default) and you need to stop doing things for now.
        • There’s also schedule_import_job( $callback, $args ), which you can call when you’re ready to begin the import. It schedules a job to run every minute, and it will keep running once a minute (or so) until the job is done. How does it know you’re done? Well, you give it a function callback to do the actual job of importing. When that function returns true, the job is considered complete and the cron schedule is removed.

        So what WP_Importer_Cron gives is a way to start a job to run once a minute, a way to check for current run time expiration, and a way to save internal class variables persistently from one run to the next. Plenty enough to build a long-running process on for importing things slowly. You can use a separate importer class from the user-data-gathering pieces, if you like, for simplicity of purpose. Or you can combine the two like the Tumblr Importer does. I’d recommend the first one, the Tumblr Importer is a bit confusing in that respect.

        • Frederick Ding 11:20 pm on July 16, 2013 Permalink | Log in to Reply

          Hey @otto42, thanks for chiming in here!

          Your clarifications for how (and why) WP_Importer_Cron works are really helpful — I wrote some notes for myself earlier today for what I figured by reading its source.

          If you don’t mind, I’ll also write some PHPDoc for the class and its methods building on what I now know, as I start separating pieces and extending WP_Importer_Cron.

          • Samuel Wood (Otto) 6:18 pm on July 17, 2013 Permalink | Log in to Reply

            Don’t look too awful much at how the Tumblr Importer works. It’s not great in that respect. Instead, think about how your process should work and then just use those three functions from the class to run periodically. You’ll probably have better luck that way.

      • Frederick Ding 11:16 pm on July 16, 2013 Permalink | Log in to Reply

        @dllh Wow, thank you for an incredibly thought-out suggestion — there’s a lot to digest here.

        From what I’ve digested of the Tumblr Importer and WP_Importer_Cron source, it stores a counter of the number of posts and other objects imported thus far and the number in total — in an option. I suppose this isn’t as comprehensive as what you’ve proposed with the custom post types — but I’m also not comfortable creating so many transient objects… definitely something to think about.

        As for backfilling: from what I can tell, in WXR files where everything was exported, attachments come first (in another case, attachments and menu items were mixed together but all before posts). I’m not sure whether this is just in the case of the two exports I looked at, or if there is a rationale for this order.

        Certainly the exporter takes great care to make sure that categories and taxonomies are exported with parents before children — I’m not quite certain why attachments would necessarily come before posts, especially in the import process, when posts are necessarily their parents.

        Thanks to WP_Importer_Cron, which suggests a way to hold progress on the server side, the JavaScript would only be for polling the status (and making at least enough requests for cron jobs to be triggered). It wouldn’t be responsible for holding state at all. I need to write a separate response to @otto42, whose comment was just posted as I was writing this.

        Thanks for your input! There’s a lot to be considered.

    • Shea Bunge 11:39 pm on July 16, 2013 Permalink | Log in to Reply

      You should definitely check your importer with the http://wptest.io test data – it’s pretty much the most complete set of test data avaliable.

      • Frederick Ding 3:43 am on July 17, 2013 Permalink | Log in to Reply

        Great resource! Definitely will. I’m assuming that all of the attachment URLs referenced are directly accessible (for attachment fetching)?

  • Frederick Ding 12:57 am on July 9, 2013 Permalink
    Tags: ,   

    Migration update: WP Relocate 

    This past week, I’ve focused primarily on bringing a usable interface to the WP_Relocate class that I posted about last week. While the WP_Relocate class file itself was meant to be something that could be bundled into another suite and reused, the fruits of this week’s labour come in the form of an installable plugin.

    In actuality, the plugin hooks are very shallow — merely used to add a menu link (for now). The most important part is the practically-standalone interface.

    WP Relocate UI

    The design intent was to make this look and feel just like the installer — or the upgrade.php script for database schema changes.

    This has been tested against 3.5.2 and 3.6 only. Despite differences in how these versions handle revisions, the changes made to post content are still revertible in either version:

    Revisions tracking search and replace

    I can test this in code to my heart’s content (except for the difficulties using the current unit-tests framework against older versions of core) but nothing beats running it on live site data, especially sites that have posts and uploads from earlier versions of WordPress. As you might imagine, live data is more difficult to generate and test against. I don’t yet know how well this process works with hundreds or thousands of posts, in terms of the time it takes and the level of verbosity communicated to the site administrator.

    If you’re interested in trying this out (please don’t use it on a production site — it’s probably not ready for that), feel free to:

    It installs into wp-content/plugins. The included readme.html file contains more detailed instructions. If anyone tries it out, I’d love to hear if it broke your site. :)

    What’s next? This week, I am forking the importer plugin that works with WXR files, to examine and fix its issues with fetching attachments from the source. I hope to add some ability to replace URLs using the WP_Relocate class to smooth out the process of copying content from another installation.

    Edit (2013-07-09): updated to 1.0.1 with fix for PHP versions before 5.4.0.

     
    • Brad Touesnard 1:25 am on July 9, 2013 Permalink | Log in to Reply

      Hi Frederick, I’m curious, have you taken a look at http://wordpress.org/plugins/wp-migrate-db/? Or the pro version for that matter? I haven’t dug into your code yet, but it sounds like you’re doing some great work here. Keep it up!

      • Frederick Ding 1:31 am on July 9, 2013 Permalink | Log in to Reply

        Hi Brad, I had come across the site for WP Migrate DB pro version while researching, but hadn’t seen the free edition. It looks like your plugin is very useful for producing a database dump for export!

        I think I have a slightly different use case in mind. The utility of this plugin is that it’s “in-place”. But in reality, all of this is merely a preface to later improvements that I’ve planned for the WXR-based importer/exporter. The plugin UI helps me to do all of the testing that unit testing won’t cover!

        Thank you for your encouragement!

    • Ryan McCue 1:31 am on July 9, 2013 Permalink | Log in to Reply

      This looks fantastic! I’m a big fan of the decision to use revisions; have you given thought to creating an undo functionality? (This would probably mean also storing copies of option data, which could be expensive.)

      • Frederick Ding 1:34 am on July 9, 2013 Permalink | Log in to Reply

        Thanks! Indeed, I’ve thought about a rollback ability (after all, things can go wrong when it’s iterating over all the posts and options).

        It might not be as expensive as one might imagine: one would only need to store copies of options that have changed, and we could probably clear all transients when running the process in either direction. (Darn, I wish I had thought of skipping transients earlier.)

        • Ryan McCue 2:46 am on July 9, 2013 Permalink | Log in to Reply

          It might not be as expensive as you might imagine: one would only need to store copies of options that have changed, and we could probably clear all transients when running the process in either direction. (Darn, I wish I had thought of skipping transients earlier.)

          I hadn’t thought about that either; great idea!

    • JakePT 5:36 am on July 9, 2013 Permalink | Log in to Reply

      This looks great! But can I use it to move a site from, say http://website.com/old to http://website.com/new?

      I just tried something similar and got this error:
      “Error: The old site URL provided cannot be used in the replacement process”

      Should also note the accompanying PHP error:
      Warning: preg_match_all() expects at least 3 parameters, 2 given in /home/demomagi/public_html/package/wp-content/plugins/relocate/class-wp-relocate.php on line 299

      • Frederick Ding 6:20 am on July 9, 2013 Permalink | Log in to Reply

        Hey Jake,

        Thanks for your feedback. My bad! I was following PHP documentation for preg_match_all not realizing that the 3rd parameter didn’t become optional until 5.4.0, hence the error in older versions of PHP. And yes, WordPress needs to work with PHP as old as 5.2.4 (that was from 2007!). I’ll make this correction and update the download immediately.

        But to answer your first question, yes, you absolutely can use it for that replacement. It was just the PHP preg_match_all compatibility issue that threw off the URL validation.

    • Shea Bunge 5:36 am on July 9, 2013 Permalink | Log in to Reply

      A word of warning: be careful when getting the user to directly access a .php file from the wp-content directory – lots of users block direct access to these files. It might be better to hook to template_redirect and check for a custom query var, and load your interface there instead.

      • Frederick Ding 6:12 am on July 9, 2013 Permalink | Log in to Reply

        You’re right about the issues with that — disabling direct access to .php files in wp-content is a sensible security approach (although plugins and themes do sometimes use direct access — such as in the case of timthumb and CSS minifying plugins).

        The issue with hooking into something like template_redirect is that I’m not sure it would load early enough. The script somewhat needs to operate independently of WordPress because it can’t rely on the “old” site URL to work — just as upgrade.php and install.php do — and the other possible alternative is to ask that users drop this file elsewhere (as Ryan suggests when using the JSON API as part of core).

    • mattyrob 1:01 pm on July 11, 2013 Permalink | Log in to Reply

      Hi Frederick. This looks like a very worthwhile project to me. I’ve just repaired a whole bunch of corrupt serialized data for attachments in the postmeta table for two of my site that probably came from a previous version of WordPress (I can’t think it was a plugin as I’ve never used a media plugin).

      Anyway, I needed to use wp_update_attachment_metadata() and wp_generate_attachment_metadata() to apply the fixes quickly and effectively.

      I’m replying as I note in 1.0.1 of your code you are using wp_insert_attachment() to update some of hte postmeta but the codex for that function indicates that the above 2 functions should be called as well. I thought you might want to look at that for rebuilding the attachment postmeta while you fix up the other table entries.

  • Frederick Ding 10:19 pm on July 1, 2013 Permalink
    Tags: ,   

    Migration project update 

    I’ve spent the past few days building functionality for what I’ve termed the “relocate” component — the class that handles site-wide replacement of old URLs with new. A plugin UI is in the work.

    Two discoveries were made in the process of coding the relocate component.

    1. Attachment locations have been stored as relative paths since 2.7, which helps with keeping them portable. However, I have seen full paths in the database tables of sites that started up pre-2.7; transitioning those to portable relative paths will be part of what I’m building.
    2. By using WordPress queries and post functions (as opposed to SQL queries) to replace within post content, we gain the advantage of storing those changes in revisions, rather than indiscriminately touching all entries in the posts table in the way that was previously documented on the Codex.

    With the guidance of WP CLI’s explanation for how to set up a plugin for testing, I’ve also coded unit tests to verify that things are working as they should. One hurdle I faced in unit testing was that update_option('siteurl') doesn’t affect the value of the WP_CONTENT_URL constant until the next execution of the script, by which time the unit test framework will have rolled back the transaction. This minor quirk won’t translate to real world use, although I’m not sure if I’m satisfied with the way I overcame it.

    Finishing the UI for this component is my timeline’s task for this week; it will facilitate a user’s decision to change the site URL to a different domain. I’ll make sure installation/usage instructions are shown by next week.

    Happy Canada Day!

     
    • Andrew Nacin 9:17 pm on July 2, 2013 Permalink | Log in to Reply

      I really like that you identified revisions as a bonus here.

      Obviously constants are not testable, but I think your workaround is a strong attempt for now.

  • Frederick Ding 8:36 pm on June 24, 2013 Permalink
    Tags: ,   

    Weekly migration project update 

    Just a quick update today, as I’ve been a little sick and stayed offline for a few days.

    Per my timeline, I worked on a light component last week that focuses on the URL replacement part of a migration operation (e.g. from http://www.example.com to https://wp.example.com/wordpress). Hints were taken from the way search-and-replace was implemented in WP-CLI. My goal was to make this usable in whatever UI implementation is built later. I also need to get a grasp on how unit tests are actually done with core, because dependencies on functions like get_option() make systematic testing more difficult without an accessible database and tables.

    In terms of what to expand on with the importer, an important question that remains is just how much should be moved with the importer.

    At the moment, despite its deficiencies, its scope can be simply delineated: it moves content. Just content. Not plugins, not site options, not themes. If the importer were expanded to use WordPress’s XML-RPC API to transfer more — to copy over site options, to install plugins based on what was previously active — then it’s much harder to draw the line at what the importer will and will not do. I think this is an important decision question that would benefit from some community input.

    This week, I’ll be coding more actively. To be done: a usable UI that can be used to test the work done thus far, and a fork of the existing importer plugin in which I will make my improvements.

     
  • Frederick Ding 10:00 am on June 17, 2013 Permalink
    Tags: ,   

    Migration project for GSoC 

    Hello world! My name is Frederick, one of the GSoC interns who will be contributing to WordPress migration features this summer.

    A proud Canadian who grew up in Toronto and its suburbs, I am currently a bioengineering undergraduate at the University of Pennsylvania in Philadelphia, with hopes of working in the clinical and public health roles of a physician. The connection to coding might seem tenuous, but I am a firm believer in pursuing passions, despite how incongruous they may seem. As I wrote in my application, WordPress has offered me much in the way of community and inspiration, and I hope to gain better insight into my own aspirations through this internship.

    Like many in the community, my involvement with WordPress has included some plugins, and sites developed for work and student organizations. Although I’ve worked on two separate open source PHP projects, this is the first opportunity I’ve had to contribute to something that can reach so many people; indeed, the past Ten Good Years have yielded not only a collection of lines of code, but a huge and intensely active ecosystem of developers, designers, and users. To have the chance, even for 3 short months, to be a part of it, is both exhilarating and terrifying at the same time!

    My project is to improve the migration experience and the portability of WordPress. Just the thought of moving WordPress elicits headaches because of all the things that can go wrong, as one stunningly recent discussion in the community reminded me.

    For this project, I’ll be treading across both familiar and foreign territory. By current plans, I’d like to bring domain/URL renames to the backend and WP CLI, improve media handling and progress feedback in the WordPress-to-WordPress importer, and build in some semblance of plugin & option migration to the export/import workflow. (Subject to further change with notice.) More details will come in the days ahead.

    I’m really thrilled to be working with all of you! In addition to my weekly updates here, my notes-to-self and handy links to Trac/source can be found on my project site. I’d love to hear your feedback here and throughout the project.

     
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel