Use Case · Document Processing

OCR plus LLM extraction. One pipeline. Six days to running.

CustomerNational health network
VerticalHealthcare
WorkloadPatient-document extraction at scale
ReplacedCustom Python orchestration
Time to Running6 days
Runtime
75%
Faster end-to-end on equivalent batches of patient PDFs.
Infrastructure
80%+
Less compute required for the same extraction throughput.
Time to Live
6d
From kickoff to running across thousands of patient documents.
Operators
1
One engineer maintains the pipeline. No separate OCR service to babysit.

Patient documents. Custom Python. Brittle at scale.

The customer needed structured metadata extracted from thousands of patient documents — clinical PDFs of varying formats, scanned and digital, requiring a multi-stage pipeline: rasterize, OCR, then run an LLM to pull structured fields. The original implementation was a custom Python orchestrator stitching together separate services for each stage.

It worked. It was also slow, brittle when one of the services hiccuped, and required ongoing engineering attention to keep running. Adding a document type meant touching three systems. Scaling throughput meant scaling each service independently and reconciling the queues between them.

The pipeline shape was right; the architecture was wrong. Every stage was its own deployment, and the glue between stages was where the failures lived.

PDF → image → Tesseract → LLM. One Falcon job.

Falcon was deployed against the customer's existing patient-document corpus. The full extraction pipeline — rasterization, OCR via Tesseract, LLM-based field extraction — ran as a single Falcon job. No orchestrator, no inter-service queues, no separate scaling decisions per stage.

  • One pipeline, four stages. Rasterize, OCR, prompt the LLM, write structured output. All operators in the same Falcon graph.
  • Same OCR engine. Tesseract ran in-process inside Falcon, not behind a service. The image-to-text step is an operator, not a network round trip.
  • LLM calls scheduled by the engine. Falcon batched and parallelized model calls based on pipeline backpressure, replacing the manual queue tuning the previous implementation required.
  • Six days from kickoff to running on the production document set. Not a prototype on synthetic data — the real corpus, the real downstream consumers.
Pipeline Topology
Before · Custom Python orchestration
PDF service Queue OCR service Queue LLM service Writer
After · Falcon
Falcon job: PDF → Image → Tesseract → LLM → Structured output
End-to-End Runtime · Equivalent Batch
Before
Custom Python orchestration · 100%
Falcon
25%
0baseline

Production, not pilot, in under a week.

Six days from kickoff, the Falcon pipeline was running across the customer's production document set. End-to-end runtime dropped 75% on equivalent batches; the infrastructure footprint required to sustain target throughput dropped more than 80%. One engineer could operate the pipeline end-to-end where the previous architecture required cross-team coordination across three services.

"By integrating Falcon into our pipeline, we achieved substantial gains in performance and scalability while reducing overhead."

Co-Founder · Production customer

The shift was less about throughput and more about scope: a single artifact replaced a distributed system, and the operational surface area collapsed accordingly. New document types are added by editing the graph, not by deploying another service.

Document pipelines are graphs, not microservices.

Three properties of Falcon are responsible for the result.

  • Mixed-workload execution. Image processing, OCR, and LLM calls run as operators in the same compiled graph. Falcon schedules CPU work, IO, and model calls together — no per-stage cluster, no per-stage scaling.
  • In-process libraries. Tesseract executes inside the pipeline. The OCR step is a function call, not a service deployment.
  • Backpressure, batching, and parallelism are engine concerns. The previous implementation handled these manually with queues and tuning knobs. Falcon's scheduler does it from the pipeline shape.

The customer kept their model, their document corpus, and their downstream consumers. What changed was the shape of the system that connected them.

Test Flight

Bring the document pipeline that's three services pretending to be one.

We'll run it as a single Falcon job and benchmark against your current orchestration. Free. 2-4 weeks. Your documents, your model, your downstream consumers.

Related

More patterns where Falcon wins.