Use Case · Batch ETL + AI Inference

Run inference inside the pipeline. Not next to it.

CustomerTier-1 defense prime
VerticalDefense / Aerospace
WorkloadBatch ETL + ONNX classification
ReplacedNiFi + separate model serving
EngagementTest Flight benchmark
vs. NiFi
57×
End-to-end runtime on the customer's reference batch + inference job.
vs. PySpark
14×
Apples-to-apples on the same input data and the same ONNX model.
Wall-clock
12s
From 690 seconds. Same pipeline, same outputs, in a Falcon-native build.
Pipelines
1
One Falcon job replaces ETL orchestration plus separate model-serving infrastructure.

Two stacks for one pipeline.

The customer's batch ETL pipeline produced records that needed to be classified by an ONNX model before landing. The architecture was the standard one: NiFi handled extraction, transform, and routing; a separate model-serving layer ran inference; results were rejoined downstream. Two clusters, two control planes, two on-call rotations, one logical job.

The end-to-end runtime on the reference workload was 690 seconds. The team had already spent a quarter tuning batch sizes, model concurrency, and serialization between the two stacks.

The problem wasn't the model. It wasn't the ETL. It was the boundary between them. Every record had to leave the data plane, cross the wire to the inference service, and come back — for every batch, every run.

One Falcon job. Inference inline. Same model.

Falcon was scoped against the customer's reference job in a Test Flight. The same input data, the same ONNX model, the same expected outputs — measured against both the existing NiFi pipeline and a PySpark equivalent the team had built for comparison.

  • The ONNX model ran inside the Falcon pipeline. No external model server. Inference is an operator in the graph, not a network call.
  • No model retraining. The ONNX artifact the team was already shipping ran as-is. Falcon loaded it, executed it, and passed records through the rest of the graph.
  • Compiled pipeline, not interpreted. The ETL + inference graph compiled to a native Rust binary before execution. There was no JVM, no Python interpreter loop, no per-batch dispatch tax.
  • Apples-to-apples benchmark against NiFi and PySpark on identical inputs, identical hardware envelope, identical model.
Pipeline Topology
Before · NiFi
Extract Transform Route Model server (ONNX) Rejoin Land
After · Falcon
Falcon job: Extract → Transform → ONNX inference → Land
End-to-End Runtime · Reference Job
NiFi
690 s
PySpark
168 s
Falcon
12 s
0690 s

57× faster. Half the stack. Edge-deployable.

Falcon ran the reference job in 12 seconds. The same job took 168 seconds on PySpark and 690 seconds on the existing NiFi pipeline. The numbers are not a tuning win — they are the difference between an interpreted, network-bounded, multi-cluster topology and a single compiled binary that loads the model once and runs it inline.

The architectural collapse mattered as much as the runtime. The Falcon deployment removed the model-serving cluster, the orchestration glue between stacks, and the per-batch network round trip. The same job that required two systems and a coordination layer was now a single binary the team could ship to a forward node.

"Reduces processing time and compute needs and is the best we have seen. Very fitting for the Army's edge compute needs."

Program Manager · Test Flight

A separate technical reviewer on the engagement put it more bluntly: "When we first met, what you told me sounded too good to be true. From the tests we've run, you've kept your word."

Inference is an operator, not a service.

Three architectural properties drove the result. None of them are achievable by tuning the existing stack.

  • ONNX execution is native to the engine. Falcon links the model into the same compiled binary as the rest of the pipeline. Every record's inference call is a function call, not a network hop.
  • Compiled pipelines, not interpreted. The full ETL graph — including inference — is lowered to native code before execution. There is no per-record interpreter overhead and no JVM warm-up tax.
  • Single artifact, single deployment. One binary holds the data logic and the model. It runs the same way on a beefy cloud node, an air-gapped data center, or an edge device with a constrained compute envelope.

This is why the comparison reads as a step change. NiFi and PySpark were optimized as far as they could be; the boundary between data and inference was the floor. Falcon removes the boundary.

Test Flight

Bring the inference job that's two stacks pretending to be one.

We'll run it as a single Falcon pipeline and benchmark against your existing topology. Free. 2-4 weeks. Apples-to-apples results, your model, your data.

Related

More patterns where Falcon wins.