Update Git author script

During the svn => git sync process we rewrite the author using a bash script that fetches the user details. This is get-author.sh in the git home directory on svn servers.

Due to character sets, it appears that non-latin names are being corrupted.
For example:

$ git clone git://develop.git.wordpress.org/
$ cd develop.git.wordpress.org
$ git show f047b94d71e780cbd7595047f28a644955d35fff | head -n3
commit f047b94d71e780cbd7595047f28a644955d35fff
Author: Greg Ziółkowski <gziolo@git.wordpress.org>
Date:   Fri Apr 21 10:41:58 2023 +0000

Greg Ziółkowski should be Greg Ziółkowski as shown on his profile.

The SQL used for this is CONCAT(display_name, '|', user_nicename). I can’t test it as I don’t have mysqlMySQL MySQL is a relational database management system. A database is a structured collection of data where content, configuration and other options are stored. https://www.mysql.com/. tools on my sandbox, but I suspect either

  • Character sets need to be specified on the mysql command, I suspect either --default-character-set=latin1 or --default-character-set=utf8mb4 would work.
  • The above concat should do some character-set conversions; I think CONCAT( CONVERT( CAST( CONVERT( display_name USING latin1) AS BINARY) USING utf8), ‘|', user_nicename) would work.

To duplicate it, you should be able to run this on the svn host:
get-author.sh gziolo

Note: You can likely remove the 2015-era logging/debugging from the file at the same time, I don’t recall the outcomes of that, but I suspect it was long fixed.. Review the logs I guess!

Let me know if you’d like me to test or debug anything.

cc @dmsnell @gziolo (Apologies for the months long delay!)
#prio2 #git #svn