Document Processing — Falcon Use Case

Runtime

75%

Faster end-to-end on equivalent batches of patient PDFs.

Infrastructure

80%+

Less compute required for the same extraction throughput.

Time to Live

6d

From kickoff to running across thousands of patient documents.

Operators

1

One engineer maintains the pipeline. No separate OCR service to babysit.

01The Challenge

Patient documents. Custom Python. Brittle at scale.

The customer needed structured metadata extracted from thousands of patient documents — clinical PDFs of varying formats, scanned and digital, requiring a multi-stage pipeline: rasterize, OCR, then run an LLM to pull structured fields. The original implementation was a custom Python orchestrator stitching together separate services for each stage.

It worked. It was also slow, brittle when one of the services hiccuped, and required ongoing engineering attention to keep running. Adding a document type meant touching three systems. Scaling throughput meant scaling each service independently and reconciling the queues between them.

The pipeline shape was right; the architecture was wrong. Every stage was its own deployment, and the glue between stages was where the failures lived.

02The Approach

PDF → image → Tesseract → LLM. One Falcon job.

Falcon was deployed against the customer's existing patient-document corpus. The full extraction pipeline — rasterization, OCR via Tesseract, LLM-based field extraction — ran as a single Falcon job. No orchestrator, no inter-service queues, no separate scaling decisions per stage.

One pipeline, four stages. Rasterize, OCR, prompt the LLM, write structured output. All operators in the same Falcon graph.
Same OCR engine. Tesseract ran in-process inside Falcon, not behind a service. The image-to-text step is an operator, not a network round trip.
LLM calls scheduled by the engine. Falcon batched and parallelized model calls based on pipeline backpressure, replacing the manual queue tuning the previous implementation required.
Six days from kickoff to running on the production document set. Not a prototype on synthetic data — the real corpus, the real downstream consumers.

Pipeline Topology

Before · Custom Python orchestration

PDF service→ Queue→ OCR service→ Queue→ LLM service→ Writer

After · Falcon

Falcon job: PDF → Image → Tesseract → LLM → Structured output

End-to-End Runtime · Equivalent Batch

Before

Custom Python orchestration · 100%

Falcon

25%

0baseline

03The Results

Production, not pilot, in under a week.

Six days from kickoff, the Falcon pipeline was running across the customer's production document set. End-to-end runtime dropped 75% on equivalent batches; the infrastructure footprint required to sustain target throughput dropped more than 80%. One engineer could operate the pipeline end-to-end where the previous architecture required cross-team coordination across three services.

"By integrating Falcon into our pipeline, we achieved substantial gains in performance and scalability while reducing overhead."

Co-Founder · Production customer

The shift was less about throughput and more about scope: a single artifact replaced a distributed system, and the operational surface area collapsed accordingly. New document types are added by editing the graph, not by deploying another service.

04Why It Worked

Document pipelines are graphs, not microservices.

Three properties of Falcon are responsible for the result.

Mixed-workload execution. Image processing, OCR, and LLM calls run as operators in the same compiled graph. Falcon schedules CPU work, IO, and model calls together — no per-stage cluster, no per-stage scaling.
In-process libraries. Tesseract executes inside the pipeline. The OCR step is a function call, not a service deployment.
Backpressure, batching, and parallelism are engine concerns. The previous implementation handled these manually with queues and tuning knobs. Falcon's scheduler does it from the pipeline shape.

The customer kept their model, their document corpus, and their downstream consumers. What changed was the shape of the system that connected them.

OCR plus LLM extraction. One pipeline. Six days to running.

Patient documents. Custom Python. Brittle at scale.

PDF → image → Tesseract → LLM. One Falcon job.

Production, not pilot, in under a week.

Document pipelines are graphs, not microservices.

Bring the document pipeline that's three services pretending to be one.

More patterns where Falcon wins.

OCR plus LLM extraction. One pipeline. Six days to running.

Patient documents. Custom Python. Brittle at scale.

PDF → image → Tesseract → LLM. One Falcon job.

Production, not pilot, in under a week.

Document pipelines are graphs, not microservices.

Bring the document pipeline that's three services pretending to be one.

More patterns where Falcon wins.

Replace always-on Databricks Photon with scale-to-zero compute. 93% less infra.

Embed ONNX execution inside the pipeline. 14× to 57× faster than the alternatives.

Cloud-grade processing on disconnected hardware. Same binary. Same model.