Batch ETL + AI Inference — Falcon Use Case

vs. NiFi

57×

End-to-end runtime on the customer's reference batch + inference job.

vs. PySpark

14×

Apples-to-apples on the same input data and the same ONNX model.

Wall-clock

12s

From 690 seconds. Same pipeline, same outputs, in a Falcon-native build.

Pipelines

1

One Falcon job replaces ETL orchestration plus separate model-serving infrastructure.

01The Challenge

Two stacks for one pipeline.

The customer's batch ETL pipeline produced records that needed to be classified by an ONNX model before landing. The architecture was the standard one: NiFi handled extraction, transform, and routing; a separate model-serving layer ran inference; results were rejoined downstream. Two clusters, two control planes, two on-call rotations, one logical job.

The end-to-end runtime on the reference workload was 690 seconds. The team had already spent a quarter tuning batch sizes, model concurrency, and serialization between the two stacks.

The problem wasn't the model. It wasn't the ETL. It was the boundary between them. Every record had to leave the data plane, cross the wire to the inference service, and come back — for every batch, every run.

02The Approach

One Falcon job. Inference inline. Same model.

Falcon was scoped against the customer's reference job in a Test Flight. The same input data, the same ONNX model, the same expected outputs — measured against both the existing NiFi pipeline and a PySpark equivalent the team had built for comparison.

The ONNX model ran inside the Falcon pipeline. No external model server. Inference is an operator in the graph, not a network call.
No model retraining. The ONNX artifact the team was already shipping ran as-is. Falcon loaded it, executed it, and passed records through the rest of the graph.
Compiled pipeline, not interpreted. The ETL + inference graph compiled to a native Rust binary before execution. There was no JVM, no Python interpreter loop, no per-batch dispatch tax.
Apples-to-apples benchmark against NiFi and PySpark on identical inputs, identical hardware envelope, identical model.

Pipeline Topology

Before · NiFi

Extract→ Transform→ Route→ Model server (ONNX)→ Rejoin→ Land

After · Falcon

Falcon job: Extract → Transform → ONNX inference → Land

End-to-End Runtime · Reference Job

NiFi

690 s

PySpark

168 s

Falcon

12 s

0690 s

03The Results

57× faster. Half the stack. Edge-deployable.

Falcon ran the reference job in 12 seconds. The same job took 168 seconds on PySpark and 690 seconds on the existing NiFi pipeline. The numbers are not a tuning win — they are the difference between an interpreted, network-bounded, multi-cluster topology and a single compiled binary that loads the model once and runs it inline.

The architectural collapse mattered as much as the runtime. The Falcon deployment removed the model-serving cluster, the orchestration glue between stacks, and the per-batch network round trip. The same job that required two systems and a coordination layer was now a single binary the team could ship to a forward node.

"Reduces processing time and compute needs and is the best we have seen. Very fitting for the Army's edge compute needs."

Program Manager · Test Flight

A separate technical reviewer on the engagement put it more bluntly: "When we first met, what you told me sounded too good to be true. From the tests we've run, you've kept your word."

04Why It Worked

Inference is an operator, not a service.

Three architectural properties drove the result. None of them are achievable by tuning the existing stack.

ONNX execution is native to the engine. Falcon links the model into the same compiled binary as the rest of the pipeline. Every record's inference call is a function call, not a network hop.
Compiled pipelines, not interpreted. The full ETL graph — including inference — is lowered to native code before execution. There is no per-record interpreter overhead and no JVM warm-up tax.
Single artifact, single deployment. One binary holds the data logic and the model. It runs the same way on a beefy cloud node, an air-gapped data center, or an edge device with a constrained compute envelope.

This is why the comparison reads as a step change. NiFi and PySpark were optimized as far as they could be; the boundary between data and inference was the floor. Falcon removes the boundary.

Run inference inside the pipeline. Not next to it.

Two stacks for one pipeline.

One Falcon job. Inference inline. Same model.

57× faster. Half the stack. Edge-deployable.

Inference is an operator, not a service.

Bring the inference job that's two stacks pretending to be one.

More patterns where Falcon wins.

Run inference inside the pipeline. Not next to it.

Two stacks for one pipeline.

One Falcon job. Inference inline. Same model.

57× faster. Half the stack. Edge-deployable.

Inference is an operator, not a service.

Bring the inference job that's two stacks pretending to be one.

More patterns where Falcon wins.

Replace always-on Databricks Photon with scale-to-zero compute. 93% less infra.

OCR + LLM extraction at production scale. Six days to running.

Cloud-grade processing on disconnected hardware. Same binary. Same model.