INNOVATE
02Healthcare SaaS·EU + North America

Going from one VM to three regions — without breaking a HIPAA audit

A clinical operations platform serving hospitals on three continents was running on a single Frankfurt VM. We moved them to active-active Kubernetes across EU, US, and APAC in nine months, with HIPAA and GDPR controls baked into the infrastructure-as-code.

20×

faster API responses globally

800ms → 40ms median

Client

Series B clinical operations platform

Timeline

9 months (architecture + build + migration + audit support)

Team

4 engineers, 1 architect, 1 SRE, 1 security engineer

Engagement

Migration project, fixed scope with weekly steering committee

01 — Challenge

The situation we walked into.

A Series B clinical SaaS company had outgrown its origin story. The platform — built quickly to win the first 50 hospital customers — ran on a single beefy VM in Frankfurt with daily snapshots and a hopeful disaster-recovery plan. As US and Australian customers came online, latency complaints became deal-blockers and enterprise procurement teams started asking pointed questions about data residency.

  • 01Single-region (eu-central-1) VM serving customers in 14 countries; baseline median API response of 800ms; APAC users reporting 1.2s+ on every interaction.
  • 02No active disaster-recovery story — a snapshot-based 'restore in 4 hours, hopefully' plan that had never been tested end-to-end.
  • 03HIPAA audit pressure from US customer growth; GDPR data-residency questions blocking three enterprise deals in the EU.
  • 04Monolithic PostgreSQL instance becoming the throughput ceiling; vertical scaling already at the largest available instance class.
  • 05Engineering team of 14 with strong product instincts but limited DevOps depth.
02 — Approach

What we actually did, in order.

Multi-region active-active was the only design that solved both the latency and data-residency problems simultaneously. We treated HIPAA + GDPR controls as part of the infrastructure code, not a layer added afterward.

01

Target architecture & compliance design

Three regions (eu-central-1, us-east-1, ap-southeast-1) on EKS with PostgreSQL logical replication, Redis edge cache, and CloudFront-based traffic routing. Data residency rules baked into routing — EU customer data never leaves the EU plane, full stop.

02

Compliance-as-code foundation

HIPAA and GDPR controls written into Terraform modules: encryption at rest and in transit, audit logging, key rotation schedules, BAA-aligned access controls, role separation. Auditors get a Terraform plan, not a PDF.

03

Database split & replication topology

PostgreSQL primary in EU with logical replication to regional read replicas. Region-local writes for non-PHI data; PHI writes routed to the appropriate residency region. Connection pooling via PgBouncer per region.

04

Application refactor for region-awareness

Refactored the application's data access layer to be region-aware — every read served locally, writes routed by data classification. Roughly 8% of the codebase touched, with comprehensive integration tests for the routing logic.

05

Phased cutover by customer cohort

Migrated customers in five cohorts over six weeks, starting with the smallest accounts. Each cohort got a one-week soak period before the next moved. Zero data-loss incidents, two minor rollbacks for unrelated config issues.

06

DR drill, audit, handover

Full active-active DR drill (intentional region failure with traffic shifted) before the audit. HIPAA and GDPR audits ran in parallel; both closed without findings. Engineering team trained on the new operational model.

03 — Stack

What it was built on.

Full technology stack
AWS EKSRDS PostgreSQL 16ElastiCache RedisCloudFrontRoute53TerraformArgoCDHelmPgBouncerOpenTelemetryDatadogVault
04 — Results

The numbers we will stand behind.

20×

faster median API responses

800ms → 40ms globally

99.99%

SLA delivered over 12 months post-launch

3

active-active regions with sub-4-minute RTO

0

audit findings across HIPAA and GDPR re-certifications

8%

of the codebase touched to support region-awareness

05 — Outcome

What changed for the business.

The three EU enterprise deals that had been blocked on data-residency questions closed within 90 days of the EU plane going active. US enterprise growth accelerated visibly once latency stopped being an objection — three of the five largest US customers in the company's history signed in the year following the migration.

The disaster-recovery story is no longer hopeful. The team runs a quarterly intentional region failure (a failed-region day, not a game day) — full traffic shift, real customer requests, recovery measured in minutes, not hours. Auditors love it.

We continue to support the platform on a fractional-SRE basis — roughly 0.5 FTE of senior infrastructure capacity that scales up and down with their needs.

06 — Timeline

How the engagement ran.

Our delivery process

01

Architecture & compliance design

6 weeks

Multi-region topology, data-residency rules, HIPAA/GDPR control mapping, target IaC structure.

02

Foundation build

10 weeks

EKS clusters in 3 regions, Terraform modules, ArgoCD, observability, encryption + audit logging baseline.

03

Application refactor

10 weeks

Region-aware data access layer, integration tests, performance benchmarking against the legacy.

04

Phased customer cutover

6 weeks

Five cohorts, smallest first. One-week soak per cohort. Zero data-loss incidents.

05

DR drill, audit, stabilization

4 weeks

Active-active DR exercise, HIPAA + GDPR re-audits, on-call rotation handover, runbook authoring.

07 — FAQ

What we get asked about this engagement.

Why three regions and not two?+
Three regions gives us active-active without requiring a 'master' region during normal operations — every region is authoritative for its own customers. Two regions either gives you active-passive (which doesn't solve the latency problem for the passive region's users) or forces synchronous writes across regions, which is too slow for transactional workloads. Three is the minimum for true active-active with regional autonomy.
How did you handle PHI write routing without latency penalties?+
PHI writes are routed to the customer's residency region — which, by design, is also the region with the lowest latency to that customer. Cross-region writes only happen for non-PHI shared data (configuration, anonymized analytics) and are async. The architecture is set up so 'compliant routing' and 'fast routing' are the same path for 99% of operations.
What happened to the original Frankfurt VM?+
Decommissioned 30 days post-cutover, after the final snapshot was archived to long-term storage per the client's data retention policy. The total infrastructure spend post-migration is roughly 40% higher than the single-VM setup — but capacity is roughly 8× higher, so cost-per-customer dropped substantially.

Have a similar problem?

Most engagements start with a 30-minute discovery call. No pitch deck, no NDAs on day one — just an honest conversation about your situation.

Schedule a Call