Maintaining Data Integrity of our CiviCRM database
Due to some conservative settings, we've ended up with some duplicate records in our database. I've written the following SQL statements to help us keep track of the dupes.
Contact with the same Email
You don't really need this one - you can simply run the Dupe/Merge rule for contacts - all of our de-dupe rules assume that same contact if the email address is the same.
SELECT e1.contact_id, e2.contact_id,e1.email FROM civicrm_email e1 JOIN civicrm_email e2 ON e1.email = e2.email WHERE e1.contact_id != e2.contact_id;
Dupe organizations
If you are looking for dupe organizations (based on just the name of the organization), you can use the Organization strict or fuzzy rule to find them.
I wrote this SQL query so I could focus on the duplicates that are registered for the social forum:
SELECT DISTINCT c1.id,c1.organization_name FROM civicrm_contact c1 JOIN civicrm_contact c2 ON c1.organization_name = c2.organization_name JOIN civicrm_relationship AS r WHERE c1.id != c2.id AND (r.relationship_type_id = 11 OR r.relationship_type_id = 12) AND r.contact_id_b = c1.id;
Even these are failry straight forward merges. However, if we have duplicates and more than one of the duplicates is registered, then we have to take special care when merging.
Here are organizations with contacts with more than one registration:
SELECT DISTINCT d1.id,d1.organization_name FROM ussf_distinct_registered_org_dupes d1 JOIN ussf_distinct_registered_org_dupes d2 ON d1.organization_name = d2.organization_name WHERE d1.id != d2.id;
That statement requires the following view to be created:
CREATE VIEW ussf_distinct_registered_org_dupes AS SELECT DISTINCT c1.id,c1.organization_name FROM civicrm_contact c1 JOIN civicrm_contact c2 ON c1.organization_name = c2.organization_name WHERE c1.id != c2.id AND c1.id IN ( SELECT DISTINCT contact_id_b FROM civicrm_relationship WHERE relationship_type_id = 11 OR relationship_type_id = 12 ) ;
And here's one that show how they paid (if you want to eliminate the paid laters.
SELECT DISTINCT d1.id,d2.organization_name,c.id,c.first_name,c.last_name,is_pay_later FROM ussf_distinct_registered_org_dupes d1 JOIN ussf_distinct_registered_org_dupes d2 ON d1.organization_name = d2.organization_name JOIN civicrm_relationship r ON d1.id = r.contact_id_b JOIN civicrm_contact c ON r.contact_id_a = c.id JOIN civicrm_participant p ON c.id = p.contact_id WHERE d1.id != d2.id AND r.relationship_type_id = 12
Properly registered contacts
We have some duplicate data about registration in CiviCRM:
- Participant record
- Tag USSF2010 Registered
This queries show contacts coded with USSF2010 tag, but who do not have a participant record.
SELECT t.contact_id FROM civicrm_entity_tag t JOIN civicrm_contact c ON c.id = t.contact_id LEFT JOIN civicrm_participant p USING(contact_id) WHERE p.contact_id IS NULL AND contact_type = 'Individual' AND t.tag_id = 11;
This one shows the opposite: contacts not coded with USSF2010 tag, but who do have a participant record.
SELECT p.contact_id FROM civicrm_participant p JOIN civicrm_contact c ON c.id = p.contact_id LEFT JOIN civicrm_entity_tag t USING(contact_id) WHERE p.contact_id IS NULL AND contact_type = 'Individual';
Duplicate participant records
We sometimes get individuals that have registered more than once:
SELECT p1.contact_id FROM civicrm_participant p1 JOIN civicrm_participant p2 ON p1.contact_id = p2.contact_id WHERE p1.id != p2.id;
General Statistics
Total registered individuals:
SELECT count(distinct c.id) FROM civicrm_participant p JOIN civicrm_contact c ON c.id = p.contact_id WHERE contact_type = 'Individual' AND p.event_id = 4 AND (p.status_id = 5 OR p.status_id = 1);
Total registered organizations:
SELECT count(distinct c.id) FROM civicrm_entity_tag t JOIN civicrm_contact c ON c.id = t.contact_id WHERE contact_type = 'Organization' AND t.tag_id = 11;
Count by state:
SELECT sp.name, COUNT(DISTINCT c.id) AS count FROM civicrm_contact c JOIN civicrm_entity_tag et ON et.contact_id = c.id JOIN civicrm_address a ON c.id = a.contact_id JOIN civicrm_state_province sp ON a.state_province_id = sp.id WHERE a.country_id = 1228 AND tag_id = 11 GROUP BY sp.name ORDER BY count;
Country by count:
SELECT co.name, COUNT(DISTINCT c.id) AS count FROM civicrm_contact c JOIN civicrm_entity_tag et ON et.contact_id = c.id JOIN civicrm_address a ON c.id = a.contact_id JOIN civicrm_country co ON a.country_id = co.id GROUP BY co.name ORDER BY count;
Count of registered users with country
SELECT count(DISTINCT c.id) FROM civicrm_contact c JOIN civicrm_entity_tag et ON et.contact_id = c.id JOIN civicrm_address a ON c.id = a.contact_id WHERE country_id IS NOT NULL;
Count of registered users with state
SELECT count(DISTINCT c.id) FROM civicrm_contact c JOIN civicrm_entity_tag et ON et.contact_id = c.id JOIN civicrm_address a ON c.id = a.contact_id WHERE state_province_id IS NOT NULL;
~ ~
