Ticket #394 (closed task: fixed)

Opened 6 months ago

Last modified 4 months ago

Improve sandbox functionality/provide anonymized version of data

Reported by: https://id.mayfirst.org/jamie Owned by:
Priority: critical Milestone: 4
Keywords: Cc:
Project Area: Software/Developer Issues Project:
Skill Set Required: Code/Development/Programming

Description

Currently, we have no reliable way of testing changes we pull into the live sites, particularly testing the interactions between the sites (such as registration triggering a user account creation of the organize site).

In addition, we don't have a good way of training people on our systems without giving them access to the live data.

I'm proposing that we collapse this "sandbox" functionality (allowing people to play around on the non-live sites for testing purposes) with "staging" functionality (test changes before pulling them to the live site). I'm referring to both functionalities as "sandbox" sites.

To be effective as both a sandbox and staging server, we will need a way to use the same data as the live site, yet anonymize the data to protect the privacy of the people who's data is in our live site.

By devising a system for anonymizing the data in the sandbox, we can also add an additional benefit: we can alter our development workflow so that developers are also sync'ing with the anonymized data from the sandbox sites, rather than sync'ing data from the live site. By avoiding a situation where all the developers have private data on their computer, we reduce the risk of exposing private data if a laptop is lost or seized.

With this kind of workflow, we can provide not only our "trusted" developers with a simple way to synchronize their data, but we can provide anyone in the world, trusted or not, with the same process of setting up a developer environment without having to worry about permissions.

Here's my proposed plan

  • Extend drush with a new command called ussf_sync
  • This command would check the settings.php file for a variable called $ussf_staging. If set to true:
    • it would continue looking in settings.php for variables $live_db_host, $live_db_user, $live_db_pass and if present, it would use those variables to to dump the live database into the staging database.
    • Next, it would run an anonymizing routine which would:
      • Change all email addresses (in the Drupal users table, user profiles, and CiviCRM) so that the user portion of the address would be a md5 hash of the entire original email address and the domain part would be @example.net. This process ensures that email address are randomized the same way across all sites (so that the user/password sync operations work).
      • Change all drupal passwords to be "password". This protects people's real passwords while also allowing developers to easily login as any user for testing purposes.
      • Change the drupal username to a md5 hash of the username. Many usernames are generated from real names. The script would make an exception for the user with user id 1 (which has the username admin on all sites) to allow easy access with full privileges.
      • Change first/last names in Drupal profile (on organize) and CiviCRM to be randomized pronounceable strings (alternating consonants with vowels).
      • Randomize all phone numbers in the Drupal profile (on organize) and CiviCRM to be random numbers.
      • Change address lines in CiviCRM to be a random number followed by USSF Drive.
      • Leave city, state, zip as is.
      • Randomize transactions. Since it will still be possible to trace a Drupal user id on the organize site, via the randomized email address, to a civicrm account on community, we will need to randomize the transactions so you can't trace a donation to an individual. By keeping the transactions intact, we will have a real body of transactions for testing.
    • After anonymizing, the script would dump the anonymized database into a sql file that would be made available for public download.
  • If $ussf_staging is set to false (e.g. on developer computers):
    • The script would download via http the anonymized sql file and import it.
    • Checking in the settings.php file for an array called $rename_users which could contain user ids and names that should be updated. So, for example, if you have a login on the live site with user id 11 and the username jamie, you could specify that after running ussf_sync, the drupal account with user id 11 should have the username changed back to jamie from the anonymized username to make it easier to login.
  • On all sites, staging or otherwise, a number of additional parameters can be set in settings.php to run after anonymizing/importing, including:
    • Turn CiviMail on or off or change the configuration
    • Change the CiviCRM javascript/css domain name information
    • Enable the developers module and set proper permissions

Attachments

Change History

Changed 6 months ago by https://id.mayfirst.org/jamie

  • priority changed from major to critical

Changed 5 months ago by alfredo@…

Because this staging area is critical to the testing policy we want to propose for the USSF, I think this ticket is as critical as the priority listing implies. Has much stuff we done on this or do we need to "deploy" to get it done? :-)

Alfredo

Changed 4 months ago by https://id.mayfirst.org/jamie

I've finished the first stage of this - it's complete for the organize.ussf2010.org site - I'll be writing up documentation, but for now, if you are up to date with our git repo, you can sync data with:

drush ussf_sync

Next up is community.ussf2010.org.

Changed 4 months ago by https://id.mayfirst.org/jamie

  • status changed from new to closed
  • resolution set to fixed

Done! And info is updated on development workflow.

jamie

Add/Change #394 (Improve sandbox functionality/provide anonymized version of data)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.