Antsy for 3.6 to start and need a project? Who wants to make an official importer for the new Twitter archives? Would think we’ll want to add that into the importers list. Would suggest importer auto-assign “status” or “aside” post format (or make it an option in the plugin to choose format). Who’s in? I volunteer to ux review and test.
-
Jen Mylo
Aaron Brazell 11:05 pm on December 16, 2012 Permalink
I already was planning on doing this as a plugin, and I’ve been quiet for awhile. I can do this. But… I need to have the archives available to me, and my account doesn’t have it yet.
Jane Wells 11:26 pm on December 16, 2012 Permalink
Mine either, but I’ll see if I can wrangle one we can use.
Ryan Duff 11:26 pm on December 16, 2012 Permalink
Or as soon as someone gets and volunteers a copy of their archive. From the post it doesn’t seem there’s an api, but a set of html pages + xml, or json files (pick your poison)
Also, from what I’ve heard they files are monthly, so if you’ve been on Twitter for 4 years you’d be looking at 48 json files to handle w/ an importer.
Of course that’s all based off what I’ve read. I don’t have it yet. Just something to chew on.
Aaron Brazell 11:32 pm on December 16, 2012 Permalink
Yeah it’ll be an interesting challenge but I wanna see what the data looks like first. If they turn it on for me, I’ve got 6 years of archives which would be a good stress test too.
Andrew Nacin 11:33 pm on December 16, 2012 Permalink
Could handle the zip that they send you as-is. In fact, I imagine that would be the best approach for users.
Aaron Brazell 2:54 pm on December 17, 2012 Permalink
I feel like we need to be able to parse the HTML, CSV and JSON in any zip file if we approach it that way. From the user perspective, I think you’re right but I sure hope the CSV and HTML are decent enough.
Samuel Wood (Otto) 11:38 pm on December 16, 2012 Permalink
It’ll just be a matter of iterating the json. Simple stupid, for the most part.
Samuel Wood (Otto) 11:06 pm on December 16, 2012 Permalink
If they actually roll it out to people unchanged, then it should be fairly trivial. When I get it on my account, I’ll let you know.
Phil Erb 11:25 pm on December 16, 2012 Permalink
When archives are available to me, I’d love to help test.
Andrew Nacin 11:36 pm on December 16, 2012 Permalink
We now have an API for importers, which means we can add this to wp-admin/import.php the moment it is done.
When anyone gets access to their zip, please share it (or at least a sample so we can learn the format).
Aaron D. Campbell 12:29 am on December 17, 2012 Permalink
Looks like my account has the option. I’m requesting an archive and will post it somewhere to use.
Aaron D. Campbell 12:39 am on December 17, 2012 Permalink
Looks like it comes down to a bunch of poorly formatted CSV files. Here’s what they sent me: http://s.ran.ge/0o2F011v3I2p
The CSVs don’t quote data, so commas in the tweet will be a bit of a pain. However, since it’s ID, time, tweet you can probably just process the ID and time, then assume the rest is the tweet (trimming the strange trailing comma).
deltafactory 6:04 am on December 17, 2012 Permalink
I’d be curious to peek at the format. Your link is broken – not sure if that was intentional or not.
Aaron D. Campbell 8:56 pm on December 17, 2012 Permalink
Sorry about the broken link. I uploaded the export to the Trac ticket: http://core.trac.wordpress.org/ticket/22981
Jane Wells 6:12 pm on December 19, 2012 Permalink
I mentioned to Ev that commas in tweets was requiring some hackiness.
Pete Mall 9:23 pm on December 19, 2012 Permalink
We don’t have to worry about it since we are using the JSON instead of the CSV.
Alex Mills (Viper007Bond) 9:30 pm on December 19, 2012 Permalink
True, but still a bug. Delimiters in CSVs need to be escaped.
Myatu 6:53 am on December 17, 2012 Permalink
Wouldn’t it be more sensible to have this as a 3rd party plugin, rather than having to maintain more bloat?
Peter Westwood 10:40 am on December 17, 2012 Permalink
It will be a plugin anyway not in the core download – all the importers are plugins now.
Myatu 6:21 am on December 18, 2012 Permalink
Actually forgot about that! Shows how often I’ve used that feature. I stand corrected
Jane Wells 11:39 am on December 17, 2012 Permalink
As @westi states, all importers are plugins, not core code. I didn’t specify that in my post, since I took it as a given that core developers know that.
Simon Wheatley 8:34 am on December 17, 2012 Permalink
Note that the current Twitter IDs overflow (?) and corrupt if you convert them to integers on a 32bit system. That one has got me before. (Apologies if that’s teaching everyone to suck eggs.)
Also, would you mind putting in a filter for the post data before save⦠I’d prefer to store tweets in a custom post type. Ta!
Brad Williams 2:59 pm on December 17, 2012 Permalink
Was thinking the same thing regarding CPTs. A filter would be perfect
Andrew Nacin 6:49 pm on December 17, 2012 Permalink
I’ve outlined a potential plan for such an importer on a Trac ticket: http://core.trac.wordpress.org/ticket/22981. If you want to continue to discuss the idea, feel free to do so here. Implementation can occur on the ticket. (This is a plugin, but an official importer is also a core priority, hence the use of core trac.)
I’ve also uploaded a tweet archive contributed by @chadhuber to the ticket. It does contain sane json.
Pete Mall 6:56 pm on December 17, 2012 Permalink
@aaroncampbell and I worked on the plugin yesterday. There’s an early patch on the ticket already. I’m working on it today and will add an updated patch eod.
Beau Lebens 9:23 pm on December 17, 2012 Permalink
In case it helps, I already made one, packaged in here: http://wordpress.org/extend/plugins/keyring-social-importers/
Pete Mall 9:26 pm on December 17, 2012 Permalink
Looks like we should be able to use your “parser”.