During the svn => git sync process we rewrite the author using a bash script that fetches the user details. This is get-author.sh
in the git home directory on svn servers.
Due to character sets, it appears that non-latin names are being corrupted.
For example:
$ git clone git://develop.git.wordpress.org/ $ cd develop.git.wordpress.org $ git show f047b94d71e780cbd7595047f28a644955d35fff | head -n3 commit f047b94d71e780cbd7595047f28a644955d35fff Author: Greg Ziółkowski <gziolo@git.wordpress.org> Date: Fri Apr 21 10:41:58 2023 +0000
Greg Ziółkowski
should be Greg Ziółkowski
as shown on his profile.
The SQL used for this is CONCAT(display_name, '|', user_nicename)
. I can’t test it as I don’t have mysqlMySQL MySQL is a relational database management system. A database is a structured collection of data where content, configuration and other options are stored. https://www.mysql.com/. tools on my sandbox, but I suspect either
- Character sets need to be specified on the
mysql
command, I suspect either--default-character-set=latin1
or--default-character-set=utf8mb4
would work. - The above concat should do some character-set conversions; I think
CONCAT( CONVERT( CAST( CONVERT( display_name USING latin1) AS BINARY) USING utf8), ‘|', user_nicename)
would work.
To duplicate it, you should be able to run this on the svn host:
get-author.sh gziolo
Note: You can likely remove the 2015-era logging/debugging from the file at the same time, I don’t recall the outcomes of that, but I suspect it was long fixed.. Review the logs I guess!
Let me know if you’d like me to test or debug anything.
cc @dmsnell @gziolo (Apologies for the months long delay!)
#prio2 #git #svn