Skip to main content

Build an Inference Pipeline

Build an Inference Pipeline — animated walkthrough overview

FieldValue
DifficultyBeginner
Estimated Read Time5 minutes
Labelsgraph, build, run, pipeline

Chapter 001 ran a model in three lines. That convenience hides a two-part lifecycle that every non-trivial Neat program uses directly: you first describe a pipeline as a Graph, then build that description into a runnable Run. This chapter makes that lifecycle visible by composing the smallest possible pipeline — one input node wired to one output node, no model in between — and pushing a single frame through it.

The payoff is conceptual: a Graph is a reusable definition you build once and execute many times, not a one-off call. By the end you will have created a graph, turned it into a runnable pipeline, and read back the rank of the output tensor — proving the frame made it through.

Walkthrough

Describe the input

Before wiring up nodes, declare what a frame looks like. InputOptions is that contract: pixel format, width/height, channel depth, and whether the runtime timestamps each buffer. The input node built from these options validates incoming frames against the shape the pipeline expects.

C++ additionally sets is_live = false to mark this as a non-live (file/tensor) source.

tutorials/004_build_inference_pipeline/build_inference_pipeline.cpp
simaai::neat::InputOptions in;
in.format = "RGB";
in.width = width;
in.height = height;
in.depth = 3;
in.is_live = false;
in.do_timestamp = true;

Compose the graph

Now build the structure. A fresh Graph is an empty composition surface, and add() appends nodes in order. We add exactly two — an input node (configured above) and a bare output node. That is the entire topology: frames enter at the input and leave at the output, with nothing in between. This is the seam where, in later chapters, a model or preprocessing stage slots in.

Nodes come from simaai::neat::nodes::Input(...) and nodes::Output().

tutorials/004_build_inference_pipeline/build_inference_pipeline.cpp
simaai::neat::Graph graph;
graph.add(simaai::neat::nodes::Input(in));
graph.add(simaai::neat::nodes::Output());

Build the pipeline

build() is the transition from description to executable. It resolves the added nodes into a concrete pipeline, validates the input/output contracts against a real sample, and creates a reusable Run handle. We pass a representative frame so build() can lock in the negotiated tensor shapes; the next step uses Run::run(...) for a deterministic one-at-a-time call.

The sample frame is a cv::Mat, and run_opt.output_memory = Owned asks the runtime to return owned output buffers.

tutorials/004_build_inference_pipeline/build_inference_pipeline.cpp
auto run = graph.build(std::vector<cv::Mat>{input}, run_opt);

Run a frame and read the result

With a Run in hand, run() pushes one frame and pulls one result synchronously. Because there's no model, the output mirrors the input contract — so reading the tensor's rank is enough to confirm a frame completed the round trip. In real pipelines this same run()/push/pull surface is how you drive inference.

run() returns a TensorList; read sample.front().shape.size().

tutorials/004_build_inference_pipeline/build_inference_pipeline.cpp
simaai::neat::TensorList sample = run.run(std::vector<cv::Mat>{input}, /*timeout_ms=*/1000);

Run

Run it and you should see the rank of the output tensor printed to stdout. Run the Python and C++ (prebuilt) commands from the Neat install root (the directory that contains share/ and lib/); run the build from source commands from the repo root. This chapter needs no model archive.

C++ (prebuilt):

./lib/sima-neat/tutorials/tutorial_004_build_inference_pipeline \
--width 320 --height 240

C++ (build from source):

./build.sh --target tutorial_004_build_inference_pipeline
./build/tutorials-standalone/tutorial_004_build_inference_pipeline \
--width 320 --height 240

Expected output:

tensor_rank=3
[OK] 004_build_inference_pipeline

(The Python build prints output_rank=....) To integrate this chapter's C++ source into your own project with a custom CMakeLists.txt (no extras folder required), see How to Run Tutorials on the landing page.

In Practice

How build/run, execution modes, the push/pull surface, and RunOptions fit together once you move past a single sync call.

Build vs run

  • Graph::build(...) constructs the pipeline and returns a Run handle for push/pull control.
  • Graph::run(...) is the synchronous convenience path: it builds (if needed), pushes one input, and pulls one output.

Sync vs async

  • Use Graph::run(...) for a simple one-shot call.
  • Use Graph::build(...) when you want a reusable runner and explicit push(...) / pull(...) control — see Run Inference Asynchronously.

Push/pull API

Run exposes:

  • push(...) / try_push(...) for inputs (cv::Mat, Tensor, or Sample).
  • pull(...), pull_tensor(...), pull_tensor_or_throw(...) for outputs.

If you need output metadata (timestamps, stream ids), use pull() to get a Sample. If you only need the tensor payload, use pull_tensor().

RunOptions (simple API)

Common knobs:

  • preset: latency/safety profile (Realtime, Balanced, Reliable).
  • queue_depth: runtime queue depth.
  • overflow_policy: queue overflow behavior (Block, KeepLatest, DropIncoming).
  • output_memory: output ownership policy (Auto, ZeroCopy, Owned).
  • on_input_drop: callback hook for dropped input events.

For queue-depth, overflow, and measurement under load, see Tune Throughput and Queue Depth.

RunAdvancedOptions (expert API)

Advanced knobs are opt-in under RunOptions::advanced:

  • advanced.max_input_bytes: cap input buffer growth.
  • advanced.copy_input: force defensive input copies.

Use Run::start_measurement() to inspect latency, throughput, input counters, plugin/edge timing, and optional board PMIC power telemetry in one measured window.

To include board power, enable it in code (no environment variable required) and read it from the measurement report:

simaai::neat::RunOptions run_opt;
run_opt.enable_board_power(); // default 100 ms sampling, auto-detects built-in profile
auto run = graph.build(inputs, run_opt);
auto scope = run.start_measurement();
run.push(inputs);
(void)run.pull_tensors(5000);
auto report = scope.stop();
run_opt = neat.RunOptions()
run_opt.enable_board_power() # default 100 ms sampling, auto-detects built-in profile
run = graph.build(tensor, run_opt)
scope = run.start_measurement()
run.push(tensor)
_ = run.pull_tensors(5000)
report = scope.stop()

Model::build(run_opt), Model::build(route_opt, run_opt), and Graph::build(run_opt) forward the same runtime options to the underlying Run, so one graph-level board power monitor is used instead of per-pipeline duplicate rail sampling. If you need to force a specific built-in profile, board-specific helpers remain available: enable_modalix_som_power(), enable_modalix_dvt_power().

Full source

Show the complete C++ and Python programs
tutorials/004_build_inference_pipeline/build_inference_pipeline.cpp
// Build a minimal Graph (Input -> Output), run a frame, read the tensor rank.
//
// Usage:
// tutorial_004_build_inference_pipeline [--width <w>] [--height <h>]

#include "neat.h"

#include <opencv2/core.hpp>

#include <iostream>
#include <stdexcept>
#include <string>

namespace {

bool get_arg(int argc, char** argv, const std::string& key, std::string& out) {
for (int i = 1; i + 1 < argc; ++i) {
if (key == argv[i]) {
out = argv[i + 1];
return true;
}
}
return false;
}

int parse_int_arg(int argc, char** argv, const std::string& key, int def) {
std::string value;
if (!get_arg(argc, argv, key, value))
return def;
return std::stoi(value);
}

} // namespace

int main(int argc, char** argv) {
try {
const int width = parse_int_arg(argc, argv, "--width", 320);
const int height = parse_int_arg(argc, argv, "--height", 240);

cv::Mat input(height, width, CV_8UC3, cv::Scalar(30, 60, 90));
if (!input.isContinuous())
input = input.clone();

simaai::neat::InputOptions in;
in.format = "RGB";
in.width = width;
in.height = height;
in.depth = 3;
in.is_live = false;
in.do_timestamp = true;

simaai::neat::RunOptions run_opt;
run_opt.output_memory = simaai::neat::OutputMemory::Owned;

// CORE LOGIC
// Compose a Graph from Input and Output nodes, then build+run one frame.
simaai::neat::Graph graph;
graph.add(simaai::neat::nodes::Input(in));
graph.add(simaai::neat::nodes::Output());
auto run = graph.build(std::vector<cv::Mat>{input}, run_opt);
simaai::neat::TensorList sample = run.run(std::vector<cv::Mat>{input}, /*timeout_ms=*/1000);

if (sample.empty())
throw std::runtime_error("missing tensor output");
std::cout << "tensor_rank=" << sample.front().shape.size() << "\n";
std::cout << "[OK] 004_build_inference_pipeline\n";
return 0;
} catch (const std::exception& e) {
std::cerr << "[FAIL] " << e.what() << "\n";
return 1;
}
}

Source