AI Engineering

Rust for Production AI: Why and How

Dipankar Sarkar · · 2 min read

The models are commoditising. GPT-5, Claude 4, Gemini 2 — within 5% of each other on most benchmarks. The next decade of competitive advantage in AI is not about the models. It is about the infrastructure that surrounds them. Rust is the right language for that infrastructure.

Why Rust for AI infrastructure

Three reasons:

  1. Performance: Rust is as fast as C, with no garbage collection pauses. For inference servers, agent runtimes, and data pipelines, this means consistent low latency. Python’s GIL and GC pauses are unacceptable in production at scale.
  2. Memory safety: Rust’s ownership model prevents entire classes of bugs (use-after-free, buffer overflows, data races). For AI infrastructure that runs in production 24/7, this is not optional.
  3. Ecosystem: the Rust AI ecosystem is maturing fast. candle (Hugging Face’s Rust ML framework), tokenizers (Rust tokenisation), rig (AI agent framework), and our own harmony-protocol (OpenAI response format in Rust).

What to build in Rust

  1. Agent runtime: the orchestration layer that manages agents, their tools, their memory, and their safety gates. This is what Neul Labs builds.
  2. Inference server: a custom inference server that wraps vLLM or TGI with Rust-accelerated pre/post-processing. fast-litellm and fast-langgraph (Neul Labs open source) are examples.
  3. CLI tools for AI agents: tools that AI agents call from the command line. gdelt-cli, gsheet-cli, hubspot-cli, apollo-io-cli — all Rust, all open source.
  4. Data pipeline: high-throughput data processing for ML training and evaluation. Polars (Rust DataFrames) is faster than Pandas for most operations.

What NOT to build in Rust

  1. Model training: stick with PyTorch. The ecosystem is there. Rust training is not ready.
  2. Prototyping: stick with Python. The iteration speed is what matters for prototyping.
  3. Notebooks: Jupyter is Python. Rust notebooks exist but are not production-grade.

The PyO3 bridge

For teams that want Rust performance without abandoning Python, PyO3 is the bridge. Write the hot path in Rust, expose it as a Python module. The Python team keeps their workflow; the Rust team handles performance.

This is the pattern we use at Neul Labs: Python for the model layer, Rust for the runtime and the performance-critical paths, PyO3 to connect them.

How to engage

The Rust Consulting engagement is designed for teams that want to introduce Rust into their AI stack. Performance audit: USD 10K. Full engagement: USD 30K-100K.

See the open-source Rust work at dipankar.name/projects/.

Dipankar Sarkar

Dipankar Sarkar

Fractional CTO & Technology Consultant

Related Articles