WordPress.org

Ready to get started?Download WordPress

Make WordPress Core

Tagged: github Toggle Comment Threads | Keyboard Shortcuts

  • Mark Jaquith 4:20 am on January 10, 2013 Permalink
    Tags: , github   

    Git Mirror History Breakage 

    A few years ago, I started publishing a mirror of WordPress on GitHub. It was subsequently promoted to WordPress/WordPress. What I neglected to do, however, was provide an appropriate authors.txt file, until recently. That means that earlier commits are attributed to dummy e-mail addresses and as such cannot be associated with user accounts on GitHub. Considering the recent introduction of contributions on GitHub, this seems a shame. Also, if we were to move to Git in the future, we would probably want our official mirror to have the best possible data.

    Proposed

    That we re-run the git-svn import with a proper authors.txt file.

    Upsides

    We’ll have a proper Git mirror with good and consistent author data, that we can, if desired, use for a future migration to Git. Commits will be properly attributed in GitHub.

    Downsides

    This will break Git history. If you have a Git checkout of WordPress, either standalone or in a submodule, that’ll mean that you’ll have to rebase your master branch off of origin (or even better, blow the whole thing away and re-clone).

    So: thoughts? Would this ruin your day?

     
    • Gustavo Bordoni 4:25 am on January 10, 2013 Permalink | Log in to Reply

      If this means that WordPress is taking any sort of steps towards using Git as a solution for code versioning I’m all for it!

    • Scott Taylor 4:25 am on January 10, 2013 Permalink | Log in to Reply

      DO IT! And document the ideal way to mirror with authors.txt after, please. I mirrored a bunch of repos and forget to do the authors part and I haven’t collected the energy to start over yet.

      • Bryan Petty 5:18 am on January 10, 2013 Permalink | Log in to Reply

        Authors file is simple: one author per line: “loginname = Joe User “.

        Run this to generate your initial authors file (from the root of your SVN checkout):
        $ svn log -q | awk -F ‘|’ ‘/^r/ {sub(“^ “, “”, $2); sub(” $”, “”, $2); print $2″ = “$2″ “}’ | sort -u > authors.txt

        Fill in the file with real names and email addresses.

        I use a modified version of Mark’s script to mirror a ton of repos myself:
        https://gist.github.com/3061041

        It’s mostly self-explanatory, see forked gist for Mark’s version.

        • Mark Jaquith 6:02 am on January 10, 2013 Permalink | Log in to Reply

          Another thing I’d do is contact everyone in that file and get them to doublecheck that we have an e-mail address that they’re likely to control for life. Probably best to use e-mail at a personal domain, if they have one, instead of Gmail or a company e-mail address that they might lose in the future.

    • Daryl Koopersmith 4:25 am on January 10, 2013 Permalink | Log in to Reply

      I think an accurate repository is worth the temporary breakage.

    • Ryan McCue 4:27 am on January 10, 2013 Permalink | Log in to Reply

      +1, I’d say we do it.

      What’d be really cool is if we can get the props parsed so that git lists the commit author as whoever was prop’d, and the committer as the person who actually committed it. AFAIK, that’s not possible without a complicated script though.

      • Bryan Petty 4:56 am on January 10, 2013 Permalink | Log in to Reply

        That would be cool, but I really can’t even think of any way to do something like this with the current repo as is without amending commits after the initial clone, which would be extremely resource intensive and could take weeks to do. Given that and the work involved with integrating the same process into the mirror updates for future commits as well, I would just say forget it.

      • Mark Jaquith 5:58 am on January 10, 2013 Permalink | Log in to Reply

        Interesting idea. But wouldn’t be able to handle issues with multiple props recipients. But we could give it to the first person or just in this case give it to the committer.

        • Ryan McCue 7:39 am on January 10, 2013 Permalink | Log in to Reply

          Multiple props authors mean that it’s ambiguous who actually created the patch, so the committer should be assigned credit lest we accidentally attribute it to the wrong person.

          (Also, we’d probably want to make sure that we fix up typos. `rmmcue` for example. ;)

      • Peter Westwood 9:39 am on January 10, 2013 Permalink | Log in to Reply

        While parsing props like this would be cool I don’t think it would accurately reflect the way our process has worked and I would much rather put effort into collecting the props to commit data into a format we can integrate into the WP.org profiles more easily.

        I started on this a while back but haven’t finished yet, what I’m mostly missing is an 100% accurate props extraction method.

        • Ryan McCue 9:55 am on January 10, 2013 Permalink | Log in to Reply

          At the moment, there’s basically two forms of commits with props: 1) the committer is merely committing a patch that was on a ticket (this is where we’d want to split author/committer); and 2) the committer is writing the patch with inspiration from someone (we’d want author = committer in this case).

          As far as I’ve seen, 1 seems to be the much more common case, but 2 is fairly common too. It could be a problem. (Regarding effort, it’s relatively simple using git filter-branch, so that shouldn’t be much of an issue.)

    • Michael Beckwith 4:27 am on January 10, 2013 Permalink | Log in to Reply

      I do all my pulling of WP from the svn repo anyway, but I keep an eye on some development via github. No harm for my stuff

    • topdown 4:31 am on January 10, 2013 Permalink | Log in to Reply

      I think that authors/contributors should be recognized when ever possible…
      +1 I say fix it.

    • Mike Schinkel 4:48 am on January 10, 2013 Permalink | Log in to Reply

      Got for it!

    • Bryan Petty 4:59 am on January 10, 2013 Permalink | Log in to Reply

      I think you’re already aware that I actually use my own clone of the WP repo partly for this reason, but also because it’s nice having branch and tag names that are exactly the same as the branch and tag names in SVN. It would be nice if those were fixed up as well if you do this.

      • Mark Jaquith 5:56 am on January 10, 2013 Permalink | Log in to Reply

        Yeah, if we’re doing this, we should take the time to iron out all other niggling issues. Would love to have your input on that. My issue with branch names is that it create ambiguous references. So if you go to checkout “3.5” it will check out the 3.5 branch. In order to check out the 3.5 tag, you need to do git checkout tags/3.5. Not the end of the world. Might be worth it to get everything cleaned up.

        Hey, maybe we can just rebase me and retroactively teach me all this Git and Git-SVN subtleties! :-) Just don’t push me, man.

        • Ryan McCue 7:41 am on January 10, 2013 Permalink | Log in to Reply

          The way I do it for SimplePie is to name the branches spelled out (ala WP.org release notice slugs), such as one-dot-two. That avoids the ambiguity there. However, that’s probably a pain for WP.

          Another option I’ve seen which are popular: rename all tags (or all branches) to vX.X so that any one starting with v is the tag (or branch) and without is the opposite.

          • Boone Gorges 11:55 am on January 10, 2013 Permalink | Log in to Reply

            For my own stuff I do something like this. `3.5` is the tag, and `3.5.x` is the branch. I think Drupal does it this way.

            • Aaron D. Campbell 2:57 pm on January 10, 2013 Permalink

              Or enforce 3 digits for all tags and 2 for all branches, so 3.5 is a branch and 3.5.0 is the first 3.5.x tag

          • Bryan Petty 3:48 pm on January 10, 2013 Permalink | Log in to Reply

            Mark does already do this, which is why the branches are named #.#-branch.

            Anyway, git does assume you wanted the branch instead of the tag, but that’s almost always the case for me anyway. I almost never checkout the tags, and I don’t think anyone else does either (definitely not with SVN either). In the 5 months or so that I’ve had my mirror running, this has never gotten in my way once or annoyed me in any way.

        • Bryan Petty 3:58 pm on January 10, 2013 Permalink | Log in to Reply

          One other issue that’s really minor is that there’s still an iis branch in SVN that didn’t make it into your mirror that probably should.

    • sourceforge 5:12 am on January 10, 2013 Permalink | Log in to Reply

      it would be good, i have been asking /systems guys to install git as revision control, but it seemed only someone in some driver’s seat could ask for stuff there! git is fast, no problem if it breaks for a while! thanks for this! full ahead flank :)

    • Ozh 6:54 am on January 10, 2013 Permalink | Log in to Reply

      I think it’s possible to modify afterwards the author of each commit, so you don’t break the whole history
      https://gist.github.com/4032945

      • Ryan McCue 7:42 am on January 10, 2013 Permalink | Log in to Reply

        That will change the commit hashes, since the author/committer is stored as part of the commit object (which is used to create the hashes). There’s no way (by design) to change these after the fact without doing this.

      • Ryan McCue 7:44 am on January 10, 2013 Permalink | Log in to Reply

        (Also, forgot to note: even if this only changed one commit, this would cascade down through all subsequent commits, since the parent’s hash is also included in the commit object)

    • aristath 6:57 am on January 10, 2013 Permalink | Log in to Reply

      I think it would be a great step forward. Drupal also used to be in SVN and switched to Git a couple of years ago. It was entitled “the great git migration” and took almost a year to design, layout and implement the whole process but it was worth it. Using Git has many advantages! I believe that breaking the history is worth it in the long run.
      Sure it might be a bit inconvenient at first, but I believe that it could really give a new boost to WordPress development.

      • aristath 6:59 am on January 10, 2013 Permalink | Log in to Reply

        correction… Drupal used to be CVS, not SVN. But the principal is the same… :)

      • Ryan McCue 9:24 am on January 10, 2013 Permalink | Log in to Reply

        To clarify: this isn’t about moving WordPress to Git, this is about fixing up the Git mirror of the SVN repo. This is a step we’d need to take if it was decided to move WP to Git, but it’s not the main goal.

    • Remkus de Vries 7:28 am on January 10, 2013 Permalink | Log in to Reply

      Git ‘er done I say. Having to do a rebase / clone is no biggy at this stage.

    • Baki Goxhaj 8:59 am on January 10, 2013 Permalink | Log in to Reply

      Re-cloning WordPress is not a big deal and adding appropriate author information is the way to go toward the future, thus I think it should be done — the sooner the better.

    • Tareq Hasan 9:33 am on January 10, 2013 Permalink | Log in to Reply

      Surely go for it. A step towards SVN to Git.

    • Abhishek Ghosh 10:58 am on January 10, 2013 Permalink | Log in to Reply

      Git is always a better option but needs carefulness on individual basis. Many options for an user is to download. The developer is getting the option to create a better documentation or guide. Cloning is not really difficult.
      There are basic problems too, a good guide is needed for increasing awareness.
      As practically we are not shifting, there is time.

    • Mark Rowatt Anderson 12:06 pm on January 10, 2013 Permalink | Log in to Reply

      Two thumbs up – go for it!

    • Edward Caissie 1:26 pm on January 10, 2013 Permalink | Log in to Reply

      It reads like a lot of great points above … and I am all for them, too. Any rebase / clone issues would be far outweighed by the eventual benefits this will bring.

    • Amy Hendrix (sabreuse) 1:31 pm on January 10, 2013 Permalink | Log in to Reply

      +1 It’s really not a big deal to rebase now compared to not having a good history sometime later.

    • Tom Willmot 2:16 pm on January 10, 2013 Permalink | Log in to Reply

      +1 Do it. We run everything with WordPress as submodule, would not be hard to re-clone.

    • aaronholbrook 2:30 pm on January 10, 2013 Permalink | Log in to Reply

      +1, anything that would move us closer to using Git would be fantastic. Also not a big deal to re-clone if needed.

    • Chris Jean 3:00 pm on January 10, 2013 Permalink | Log in to Reply

      Sounds like a bandaid that needs to be ripped off. Better now than later when even more people use it.

    • mojowill 3:04 pm on January 10, 2013 Permalink | Log in to Reply

      I’d love to see a full move to GIT for everything on wporg!

    • Sam Parsons 10:45 pm on January 10, 2013 Permalink | Log in to Reply

      I’m all for the update in order to improve the history and prepare for a possible move to git. I’m wondering whether you plan to send a little message (could it be automated?) to all those who have forked the repo on github?

      https://github.com/WordPress/WordPress/network/members

      That would be hugely helpful in communicating the upcoming changes in case those people don’t read this blog (perish the thought).

      • Mark Jaquith 6:47 am on January 11, 2013 Permalink | Log in to Reply

        GitHub removed their private messaging feature, so I’d have no automated way of notifying everyone. This doesn’t concern me so much as we don’t accept pull requests on GitHub, so it’s not like their forks are functional in that way. I also think a lot of people fork repos and never update it from the upstream again. So they probably wouldn’t notice. And it’s easy enough to destroy it and refork it.

        What I was considering doing was putting a note on our project description on GitHub, for the next few months, providing a link to a post that explained what happened and how to resolve the divergent Git history.

    • Mark Jaquith 6:51 am on January 11, 2013 Permalink | Log in to Reply

      As the response was overwhelmingly positive (even from some of you who are traditionally serial devil’s advocates), I think we’re going to move forward with this. Thanks, all, for your feedback.

      What I’ll likely do it consult with various people (@bpetty, notably) about implementation, doublecheck the e-mail address in my authors.txt file (recommending that everyone use addresses at personal domains that they’re likely to control indefinitely), and then push out a WordPress-Fixup repo for people to audit, before pushing the new history to the WordPress repo.

      • Bryan Petty 7:02 pm on January 11, 2013 Permalink | Log in to Reply

        Confirming email addresses used would definitely be a good idea. I think a large portion of what you have now originally came from my list, which was meticulously put together from scouring plugin readmes, wp-hackers archives, and personal sites for publicly visible addresses since, at the time, I knew I wouldn’t be able to simply pull them from WP.org accounts used to make the commits (which would likely be the best source, aside from contacting everyone individually).

    • BFTrick 10:12 pm on January 15, 2013 Permalink | Log in to Reply

      This sounds brilliant.

  • Nikolay Bachiyski 5:04 am on May 14, 2010 Permalink
    Tags: , github   

    For forking pleasure: http://github.com/wordpress/

     
    • Matt 6:08 am on May 14, 2010 Permalink | Log in to Reply

      How does this stay in sync?

      • Zoran Zaric 8:29 am on May 14, 2010 Permalink | Log in to Reply

        There’s a SVN-module in git. With git-svn you can pull changes from a master-subersion-repository and push to it while having all benefits of a git repository. These include cheap and easy branching and merging, being able to to commit, branch merge, whatever without a connection to the svn-server and – what i like the most for wordpress – have the opportunity to branch and commit without commit-privileges in the subversion repository. With this people can do little commits local and then let the one that manages the master-git repository pull from them and commit to svn.

        Having a GitHub git repository can give wordpress a lot of momentum, there are a lot of wise people on GitHub.

      • Nikolay Bachiyski 12:51 pm on May 14, 2010 Permalink | Log in to Reply

        Cron, which didn’t work so well last night.

    • Edho 8:22 am on May 14, 2010 Permalink | Log in to Reply

      where did the tags gone?

    • Aaron Brazell 2:13 pm on May 14, 2010 Permalink | Log in to Reply

      Is this an official thing or just a pet project? Should we consider moving to git altogether?

    • Doug Stewart 3:34 pm on May 14, 2010 Permalink | Log in to Reply

      Any interest in a BitBucket/hg clone as well? I’ve been looking to do something similar…

    • Ryan 3:32 am on May 15, 2010 Permalink | Log in to Reply

      Is WordPress 3.0 still on track to be released on April 15th?

      (Can an admin delete my previous post? I accidentally included my email as my name. Thanks!)

      • dd32 10:32 am on May 17, 2010 Permalink | Log in to Reply

        May 15th was a suggested release date (When all dates were pushed back 2 weeks), That has obviously passed however. There is no firm date for release yet (that I’m aware of), it’ll be released shortly hopefully..

    • NICKD 4:48 pm on May 17, 2010 Permalink | Log in to Reply

      Any new release dates on WordPress 3.0 I know it was supposed to be release on May 15, 2010 as per schedule. On this site.

    • hakre 2:07 pm on May 19, 2010 Permalink | Log in to Reply

      Which sync strategy can you suggest for the core repository? I it is pretty big to do a full sync of all properties every day, right?

    • Tom Adams 9:19 am on May 27, 2010 Permalink | Log in to Reply

      Here’s one that’s kept more up-to-date and also has tags: http://github.com/dxw/wordpress

c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel