01FinTech·Western Europe

Cutting payment gateway latency 6× without a single hour of downtime

A 12-year-old PHP monolith was failing PCI-DSS audits and choking under peak load. Fourteen months later: p95 down to 320ms, zero-downtime cutover, audit closed without findings.

60%

load time reduction

p95 1.8s → 320ms

Client

Mid-market payments processor

Timeline

14 months (build + migration + 90-day stabilization)

Team

6 engineers, 1 architect, 1 SRE

Engagement

Architecture-first, fixed-quarter milestones with monthly demos

01 — Challenge

The situation we walked into.

A mid-market European payments processor had been running their gateway on the same PHP 5.6 monolith they shipped in 2012. It worked — until it didn't. PCI-DSS auditors flagged the underlying infrastructure stack as end-of-life. Peak-window p95 latency had crept above 1.8 seconds. Merchant complaints were mounting and a competitor was openly using their slowness as a sales angle.

01p95 response time of 1.8 seconds during peak processing windows; sustained timeouts during European market open.
02PCI-DSS audit failing on infrastructure age, missing CI/CD, and informal change-management.
03No automated test coverage on the payment authorization path — every release was a manual smoke test.
04Merchant churn directly attributable to checkout latency, with two large accounts threatening to leave.
05A 12-engineer team that knew the system well but had no time to modernize it while keeping it alive.

02 — Approach

What we actually did, in order.

We chose a strangler-fig migration over a rewrite. The legacy gateway kept handling traffic while we extracted services one transaction path at a time, with continuous shadow-traffic comparison so every cutover was reversible.

Architecture & risk assessment

Mapped the entire transaction graph, tagged every PCI-scoped surface, and ranked services by extraction risk. The first three services chosen were high-volume / low-risk to build team confidence.

Greenfield platform foundation

Stood up a new Next.js merchant portal and Node.js authorization microservices on EKS with Terraform-managed infrastructure. Built CI/CD with automated PCI-DSS controls (image scanning, secrets management, change approvals) before a single service migrated.

Strangler-fig cutover (per service)

Each service ran in shadow mode — receiving production traffic, returning real responses, but not authoritative — for two weeks. Diff dashboards compared old vs. new on every transaction. Cutover happened only after a clean week.

Authorization path migration

The riskiest 30% of code, migrated last. Blue/green deployments per service, with sub-second rollback wired into the dashboard. Cutover completed during a deliberately quiet window with full customer comms.

Audit, monitor, hand off

Worked through the PCI-DSS re-audit alongside the client's compliance team. Established OpenTelemetry-driven observability so the in-house team could spot latency regressions before merchants did. Embedded with their team for a 90-day stabilization period.

03 — Stack

What it was built on.

Full technology stack

Next.js 15React 19Node.js 20PostgreSQL 16Redis 7StripeKubernetes (EKS)TerraformGitHub ActionsOpenTelemetryDatadogVault

04 — Results

The numbers we will stand behind.

60%

load time reduction

p95 1.8s → 320ms

99.97%

uptime sustained over 12 months post-launch

4×

transaction headroom on the same monthly infrastructure spend

PCI-DSS audit findings on the rebuilt platform

minutes of customer-visible downtime during the 14-month migration

05 — Outcome

What changed for the business.

Both at-risk merchant accounts re-signed multi-year contracts within 60 days of cutover. The compliance team closed the PCI-DSS re-audit with zero findings — the first clean audit the company had in five years.

More importantly, the shape of the engineering organization changed. The same 12-person team that had been firefighting now runs a documented release schedule. Their on-call burn rate dropped roughly 70% in the six months after handover, freeing capacity for the product roadmap that had been frozen since 2022.

We continue to support the platform on a quarterly health-check basis — an arrangement we strongly recommend for production financial systems.

06 — Timeline

How the engagement ran.

Our delivery process

Discovery & architecture

8 weeks

Transaction graph mapping, PCI-scope tagging, risk-ranked extraction backlog, target architecture sign-off.

Platform foundation

12 weeks

EKS cluster, Terraform IaC, CI/CD with PCI-DSS controls, observability stack, merchant portal shell.

Strangler-fig migration

32 weeks

9 services extracted in priority order, each with two weeks of shadow traffic and a blue/green cutover.

Authorization migration & cutover

8 weeks

The PCI-scoped core, migrated last. Sub-second rollback wired in. Final cutover during a planned low-volume window.

Audit support & stabilization

12 weeks

PCI-DSS re-audit, observability handover, on-call rotation training, runbook authoring.

07 — FAQ

What we get asked about this engagement.

Why a strangler-fig migration instead of a rewrite?+

Two reasons. First, payment systems cannot afford a 'big bang' cutover — the blast radius is unacceptable. Second, a rewrite assumes you already understand every undocumented edge case in the legacy system, which is almost never true for a 12-year-old codebase. Strangler-fig lets you discover those edge cases in production with a safety net, instead of in a dev environment where you'll miss them.

How did you keep PCI-DSS scope from expanding during the migration?+

We tagged PCI-scoped surfaces explicitly during the discovery phase and treated scope reduction as a hard design constraint, not a side effect. The rebuilt platform actually has a smaller PCI footprint than the legacy one, because we offloaded card-data handling to a tokenized vault pattern that the legacy monolith couldn't support architecturally.

Could the in-house team have done this themselves?+

Possibly, given two more years and permission to slow product work to a crawl. The constraint here was time and capacity, not capability. They are running the platform now, post-handover, and have shipped meaningful improvements without our involvement.

Services involved

Have a similar problem?

Most engagements start with a 30-minute discovery call. No pitch deck, no NDAs on day one — just an honest conversation about your situation.

Schedule a Call

Cutting payment gateway latency 6× without a single hour of downtime

The situation we walked into.

What we actually did, in order.

Architecture & risk assessment

Greenfield platform foundation

Strangler-fig cutover (per service)

Authorization path migration

Audit, monitor, hand off

What it was built on.

The numbers we will stand behind.

What changed for the business.

How the engagement ran.

What we get asked about this engagement.

What we brought to this engagement.

Full-Stack Web Development

Cloud Infrastructure & DevOps

Custom Software Development

More from the portfolio.

Multi-region Kubernetes migration for a clinical SaaS platform

Document AI for a freight forwarder — 800/day, multilingual

Have a similar problem?