Right now WordCampWordCamp WordCamps are casual, locally-organized conferences covering everything related to WordPress. They're one of the places where the WordPress community comes together to teach one another what they’ve learned throughout the year and share the joy. Learn more. devs use a small subset of the production database that was manually created, because it wouldn’t be safe to keep copies of the production database in local environments.
That works good enough for most things, but we keep running into situations where reproducing bugs and testing fixes is much harder, and takes much longer, than it would if we had real-world data to work with.
So, I’d like to create a way to safely use a whitelisted copy of production data in local environments. Here’s how I envision it working:
- Create a script that runs on the production web server once a day
- It would create a copy of the primary database on the production database server
- Then run lots of SQL commands against that copy in order to redact anything that hasn’t been whitelisted
- Have another script in dev environments that uses
sftp
to download a copy of the whitelisted database once a day
The whitelist would contain a list of tables, columns, and keys that have been determined to not have any sensitive data. For example:
wp_users
– The table itself would be whitelisted, but only theID
,user_login
,user_nicename
,user_registered
,user_status
,display_name
,spam
, anddeleted
fields would be whitelisted. Becauseuser_pass
,user_email
, anduser_activation_key
would not be in the whitelist, the script would replace the contents of those columns with[redacted]
(or in the case ofuser_email
,redacted@example.org
).wp_usermeta
– The table itself would be whitelisted, along with theumeta_id
,user_id
,meta_key
, andmeta_value
columns, but only certainmeta_key
rows would be whitelisted. For instance,first_name
,last_name
,description
, andwp_capabilities
would be whitelisted, butsession_tokens
andwordcamp-qbo-oauth
would not be.
Additionally, the script would have some logic to redact potentially sensitive values within whitelisted columns. For example, any e-mail addresses inside a meta_value
value would be replaced with redacted@example.org
.
What does Systems think about that? I’d do all the work to build the script, but I want to make sure you don’t have any security/privacy concerns.