Healthcare · 16 weeks · A healthcare provider
Unified customer data platform
Client name anonymized under NDA. Industry, technical approach, tools, and measured outcomes are reported as-is. Named references available on request.
# RESULTS
- Duplicates resolved
- 87%
- Source of truth
- 1
- End to end
- 16 wk
- Privacy-by-design
- GDPR
# THE-CHALLENGE
4 CRMs, 3 billing platforms, no unified view. Every department had its own version of the patient — and none of them agreed. Marketing sent duplicate mailings. Billing couldn't reconcile accounts. Clinical teams worked from partial records.
The root cause wasn't just technical fragmentation — it was organizational. Each department had chosen its own system over the past decade. Nobody had authority to merge them, so they'd built point-to-point integrations that silently broke and produced conflicting data. A patient could have three different addresses across four systems, and nobody knew which was current.
# THE-TRANSFORMATION
- 4 CRMs and 3 billing platforms, none of them talking to each other
- Marketing sent duplicate mailings, billing couldn't reconcile
- Patient #10042 in CRM ≠ Patient #10042 in billing
- GDPR compliance was "we think we're fine"
- Single source of truth across all 7 systems
- 87% of duplicate records identified and merged
- Full GDPR audit trail — compliance is documented, not assumed
- Downstream systems pull from unified platform via API
# OUR-APPROACH
Data archaeology
Audited all 7 source systems. Mapped entity relationships, identified overlap, and cataloged every field that contributed to the "who is this patient?" question.
Identity resolution
Built a probabilistic matching pipeline in Python — not just name + DOB, but address history, contact patterns, and billing identifiers. Validated against a 500-record manually verified sample.
Platform build
Unified data layer in Snowflake with pseudonymization, consent tracking, and full GDPR audit trails. Fivetran for ingestion, dbt for transformation, Terraform for infrastructure.
API & handover
Created API endpoints so downstream systems pull from the unified platform instead of each other. Documented everything. Trained internal team to maintain the pipeline.
Three of the seven systems had overlapping patient IDs in different formats. The worst case: two systems used the same ID field name but assigned them from different sequences. Patient #10042 in the CRM was a completely different person than Patient #10042 in billing. We caught it during validation — if we hadn't, the unified platform would have merged two strangers' medical records.
# TECH-STACK
# OPERATING-CONTEXT
Operating constraint
Patient data meant zero tolerance for record mis-merges — a false positive in identity resolution could mean combining two strangers' medical records. Every matching rule required sign-off from the compliance lead. GDPR audit trails had to be in place before a single record was merged, not retrofitted after.
Adoption & rollout
Downstream systems (CRM, billing, clinical) were re-pointed to pull from the unified platform via API instead of from each other — no big-bang cutover. Rollout sequenced by department over 4 weeks with rollback paths at each step. Internal data team took over ongoing pipeline ownership 4 weeks before the engagement ended; we stayed on call through the first month of independent operation.