External: US document processing latency and failures (2026-06-05)

Summary

On 2026-06-05, Reducto experienced elevated latency and job failures for document processing requests in our US production environment. The primary impact window was 07:24-08:25 PT.

Customers saw requests hang for longer than normal, and some jobs failed with server-side errors. Reducto on-call responded during the incident, increased processing capacity, and the system returned to baseline by approximately 08:15 PT.

Impact

Window: 07:24-08:10 PT on 2026-06-05.
Affected: US production document processing, primarily parse-dependent workflows including parse and extract.
Symptoms: Increased processing latency, hanging requests, and server-side job failures.

Across customer production traffic in the primary window, excluding internal/test traffic, we measured:

Metric	Count
Submitted jobs	112,374
Server-error failed jobs	9,576
Jobs lasting >=10 min	29,977
Jobs lasting >=20 min	11,297
Server-error failed or >=20 min	19,040 jobs across 178 customer orgs

Timeline

Time PT	Event
07:24	Reducto detected elevated pending work in the US processing system.
07:25	Reducto on-call engaged.
07:49	Investigation confirmed that most queued work was document parsing work.
07:56	Reducto increased downstream processing capacity.
07:59	Parse throughput began recovering.
08:08	During backlog drain, some jobs began failing against OCR dependencies.
08:15	The original pending-work alert resolved.
08:20	Customer-facing service was considered recovered.

Root cause

The incident was caused by a sudden spike in document-processing work combined with insufficient effective compute capacity in part of our inference stack. We had cutover to a new inference server yesterday that required higher CPUs under load. Under the spike, our server had cross model contention and the infrastructure provider could not supply the extra CPUs consistently, which caused model inference work to slow down while the queue continued to grow.

As the backlog built up, some requests waited a long time before processing and eventually failed or timed out. During recovery, the system drained the backlog quickly, which created a secondary burst of OCR traffic. That burst hit OCR-provider limits and caused additional server-side failures before the system fully stabilized.

We did not find evidence that this incident was triggered by a recent full production deploy immediately before the impact window.

What we're doing about it

Separate Inference service: We’re separating out one of our models from the inference server to scale independently to prevent contention.
Tighten CPU reservation and capacity validation. We are updating capacity settings for our inference services so containers reserve the CPU they need under peak load.
Add backpressure during recovery. We are changing backlog-drain behavior so recovery does not overwhelm OCR dependencies and create secondary failures.

Summary

Impact

Timeline

Root cause

What we're doing about it

What you can do