01
Architecture & risk assessment
Mapped the entire transaction graph, tagged every PCI-scoped surface, and ranked services by extraction risk. The first three services chosen were high-volume / low-risk to build team confidence.
A 12-year-old PHP monolith was failing PCI-DSS audits and choking under peak load. Fourteen months later: p95 down to 320ms, zero-downtime cutover, audit closed without findings.
load time reduction
p95 1.8s → 320ms
Client
Mid-market payments processor
Timeline
14 months (build + migration + 90-day stabilization)
Team
6 engineers, 1 architect, 1 SRE
Engagement
Architecture-first, fixed-quarter milestones with monthly demos
A mid-market European payments processor had been running their gateway on the same PHP 5.6 monolith they shipped in 2012. It worked — until it didn't. PCI-DSS auditors flagged the underlying infrastructure stack as end-of-life. Peak-window p95 latency had crept above 1.8 seconds. Merchant complaints were mounting and a competitor was openly using their slowness as a sales angle.
We chose a strangler-fig migration over a rewrite. The legacy gateway kept handling traffic while we extracted services one transaction path at a time, with continuous shadow-traffic comparison so every cutover was reversible.
01
Mapped the entire transaction graph, tagged every PCI-scoped surface, and ranked services by extraction risk. The first three services chosen were high-volume / low-risk to build team confidence.
02
Stood up a new Next.js merchant portal and Node.js authorization microservices on EKS with Terraform-managed infrastructure. Built CI/CD with automated PCI-DSS controls (image scanning, secrets management, change approvals) before a single service migrated.
03
Each service ran in shadow mode — receiving production traffic, returning real responses, but not authoritative — for two weeks. Diff dashboards compared old vs. new on every transaction. Cutover happened only after a clean week.
04
The riskiest 30% of code, migrated last. Blue/green deployments per service, with sub-second rollback wired into the dashboard. Cutover completed during a deliberately quiet window with full customer comms.
05
Worked through the PCI-DSS re-audit alongside the client's compliance team. Established OpenTelemetry-driven observability so the in-house team could spot latency regressions before merchants did. Embedded with their team for a 90-day stabilization period.
load time reduction
p95 1.8s → 320ms
uptime sustained over 12 months post-launch
transaction headroom on the same monthly infrastructure spend
PCI-DSS audit findings on the rebuilt platform
minutes of customer-visible downtime during the 14-month migration
Both at-risk merchant accounts re-signed multi-year contracts within 60 days of cutover. The compliance team closed the PCI-DSS re-audit with zero findings — the first clean audit the company had in five years.
More importantly, the shape of the engineering organization changed. The same 12-person team that had been firefighting now runs a documented release schedule. Their on-call burn rate dropped roughly 70% in the six months after handover, freeing capacity for the product roadmap that had been frozen since 2022.
We continue to support the platform on a quarterly health-check basis — an arrangement we strongly recommend for production financial systems.
01
Discovery & architecture
Transaction graph mapping, PCI-scope tagging, risk-ranked extraction backlog, target architecture sign-off.
02
Platform foundation
EKS cluster, Terraform IaC, CI/CD with PCI-DSS controls, observability stack, merchant portal shell.
03
Strangler-fig migration
9 services extracted in priority order, each with two weeks of shadow traffic and a blue/green cutover.
04
Authorization migration & cutover
The PCI-scoped core, migrated last. Sub-second rollback wired in. Final cutover during a planned low-volume window.
05
Audit support & stabilization
PCI-DSS re-audit, observability handover, on-call rotation training, runbook authoring.
01
End-to-end web applications — from API design to deployment pipelines. React, Next.js, Node.js, and the rest of the stack you'll actually run in production.
Learn more05
Infrastructure that doesn't keep you up at night. AWS, GCP, Azure. Kubernetes when it earns its place. CI/CD with rollback. Cost-aware from day one.
Learn more02
Bespoke business systems built around the workflow you actually have, not the one a generic SaaS forces on you.
Learn moreSingle-region VM in Frankfurt → three-region active-active EKS deployment. API response time dropped from 800ms to 40ms globally. HIPAA + GDPR audit closed without findings.
A 12-person ops team manually classifying 800+ shipping documents a day became a 4-engineer + LLM pipeline doing it in minutes, with a 0.3% error rate. Customs holds dropped 80%.
Most engagements start with a 30-minute discovery call. No pitch deck, no NDAs on day one — just an honest conversation about your situation.
Schedule a Call