INNOVATE
01FinTech·Western Europe

Cutting payment gateway latency 6× without a single hour of downtime

A 12-year-old PHP monolith was failing PCI-DSS audits and choking under peak load. Fourteen months later: p95 down to 320ms, zero-downtime cutover, audit closed without findings.

60%

load time reduction

p95 1.8s → 320ms

Client

Mid-market payments processor

Timeline

14 months (build + migration + 90-day stabilization)

Team

6 engineers, 1 architect, 1 SRE

Engagement

Architecture-first, fixed-quarter milestones with monthly demos

01 — Challenge

The situation we walked into.

A mid-market European payments processor had been running their gateway on the same PHP 5.6 monolith they shipped in 2012. It worked — until it didn't. PCI-DSS auditors flagged the underlying infrastructure stack as end-of-life. Peak-window p95 latency had crept above 1.8 seconds. Merchant complaints were mounting and a competitor was openly using their slowness as a sales angle.

  • 01p95 response time of 1.8 seconds during peak processing windows; sustained timeouts during European market open.
  • 02PCI-DSS audit failing on infrastructure age, missing CI/CD, and informal change-management.
  • 03No automated test coverage on the payment authorization path — every release was a manual smoke test.
  • 04Merchant churn directly attributable to checkout latency, with two large accounts threatening to leave.
  • 05A 12-engineer team that knew the system well but had no time to modernize it while keeping it alive.
02 — Approach

What we actually did, in order.

We chose a strangler-fig migration over a rewrite. The legacy gateway kept handling traffic while we extracted services one transaction path at a time, with continuous shadow-traffic comparison so every cutover was reversible.

01

Architecture & risk assessment

Mapped the entire transaction graph, tagged every PCI-scoped surface, and ranked services by extraction risk. The first three services chosen were high-volume / low-risk to build team confidence.

02

Greenfield platform foundation

Stood up a new Next.js merchant portal and Node.js authorization microservices on EKS with Terraform-managed infrastructure. Built CI/CD with automated PCI-DSS controls (image scanning, secrets management, change approvals) before a single service migrated.

03

Strangler-fig cutover (per service)

Each service ran in shadow mode — receiving production traffic, returning real responses, but not authoritative — for two weeks. Diff dashboards compared old vs. new on every transaction. Cutover happened only after a clean week.

04

Authorization path migration

The riskiest 30% of code, migrated last. Blue/green deployments per service, with sub-second rollback wired into the dashboard. Cutover completed during a deliberately quiet window with full customer comms.

05

Audit, monitor, hand off

Worked through the PCI-DSS re-audit alongside the client's compliance team. Established OpenTelemetry-driven observability so the in-house team could spot latency regressions before merchants did. Embedded with their team for a 90-day stabilization period.

03 — Stack

What it was built on.

Full technology stack
Next.js 15React 19Node.js 20PostgreSQL 16Redis 7StripeKubernetes (EKS)TerraformGitHub ActionsOpenTelemetryDatadogVault
04 — Results

The numbers we will stand behind.

60%

load time reduction

p95 1.8s → 320ms

99.97%

uptime sustained over 12 months post-launch

transaction headroom on the same monthly infrastructure spend

0

PCI-DSS audit findings on the rebuilt platform

0

minutes of customer-visible downtime during the 14-month migration

05 — Outcome

What changed for the business.

Both at-risk merchant accounts re-signed multi-year contracts within 60 days of cutover. The compliance team closed the PCI-DSS re-audit with zero findings — the first clean audit the company had in five years.

More importantly, the shape of the engineering organization changed. The same 12-person team that had been firefighting now runs a documented release schedule. Their on-call burn rate dropped roughly 70% in the six months after handover, freeing capacity for the product roadmap that had been frozen since 2022.

We continue to support the platform on a quarterly health-check basis — an arrangement we strongly recommend for production financial systems.

06 — Timeline

How the engagement ran.

Our delivery process

01

Discovery & architecture

8 weeks

Transaction graph mapping, PCI-scope tagging, risk-ranked extraction backlog, target architecture sign-off.

02

Platform foundation

12 weeks

EKS cluster, Terraform IaC, CI/CD with PCI-DSS controls, observability stack, merchant portal shell.

03

Strangler-fig migration

32 weeks

9 services extracted in priority order, each with two weeks of shadow traffic and a blue/green cutover.

04

Authorization migration & cutover

8 weeks

The PCI-scoped core, migrated last. Sub-second rollback wired in. Final cutover during a planned low-volume window.

05

Audit support & stabilization

12 weeks

PCI-DSS re-audit, observability handover, on-call rotation training, runbook authoring.

07 — FAQ

What we get asked about this engagement.

Why a strangler-fig migration instead of a rewrite?+
Two reasons. First, payment systems cannot afford a 'big bang' cutover — the blast radius is unacceptable. Second, a rewrite assumes you already understand every undocumented edge case in the legacy system, which is almost never true for a 12-year-old codebase. Strangler-fig lets you discover those edge cases in production with a safety net, instead of in a dev environment where you'll miss them.
How did you keep PCI-DSS scope from expanding during the migration?+
We tagged PCI-scoped surfaces explicitly during the discovery phase and treated scope reduction as a hard design constraint, not a side effect. The rebuilt platform actually has a smaller PCI footprint than the legacy one, because we offloaded card-data handling to a tokenized vault pattern that the legacy monolith couldn't support architecturally.
Could the in-house team have done this themselves?+
Possibly, given two more years and permission to slow product work to a crawl. The constraint here was time and capacity, not capability. They are running the platform now, post-handover, and have shipped meaningful improvements without our involvement.

Have a similar problem?

Most engagements start with a 30-minute discovery call. No pitch deck, no NDAs on day one — just an honest conversation about your situation.

Schedule a Call