Rust for Production AI: Why and How
The models are commoditising. GPT-5, Claude 4, Gemini 2 — within 5% of each other on most benchmarks. The next decade of competitive advantage in AI is not about the models. It is about the infrastructure that surrounds them. Rust is the right language for that infrastructure.
Why Rust for AI infrastructure
Three reasons:
- Performance: Rust is as fast as C, with no garbage collection pauses. For inference servers, agent runtimes, and data pipelines, this means consistent low latency. Python’s GIL and GC pauses are unacceptable in production at scale.
- Memory safety: Rust’s ownership model prevents entire classes of bugs (use-after-free, buffer overflows, data races). For AI infrastructure that runs in production 24/7, this is not optional.
- Ecosystem: the Rust AI ecosystem is maturing fast.
candle(Hugging Face’s Rust ML framework),tokenizers(Rust tokenisation),rig(AI agent framework), and our ownharmony-protocol(OpenAI response format in Rust).
What to build in Rust
- Agent runtime: the orchestration layer that manages agents, their tools, their memory, and their safety gates. This is what Neul Labs builds.
- Inference server: a custom inference server that wraps vLLM or TGI with Rust-accelerated pre/post-processing.
fast-litellmandfast-langgraph(Neul Labs open source) are examples. - CLI tools for AI agents: tools that AI agents call from the command line.
gdelt-cli,gsheet-cli,hubspot-cli,apollo-io-cli— all Rust, all open source. - Data pipeline: high-throughput data processing for ML training and evaluation. Polars (Rust DataFrames) is faster than Pandas for most operations.
What NOT to build in Rust
- Model training: stick with PyTorch. The ecosystem is there. Rust training is not ready.
- Prototyping: stick with Python. The iteration speed is what matters for prototyping.
- Notebooks: Jupyter is Python. Rust notebooks exist but are not production-grade.
The PyO3 bridge
For teams that want Rust performance without abandoning Python, PyO3 is the bridge. Write the hot path in Rust, expose it as a Python module. The Python team keeps their workflow; the Rust team handles performance.
This is the pattern we use at Neul Labs: Python for the model layer, Rust for the runtime and the performance-critical paths, PyO3 to connect them.
How to engage
The Rust Consulting engagement is designed for teams that want to introduce Rust into their AI stack. Performance audit: USD 10K. Full engagement: USD 30K-100K.
See the open-source Rust work at dipankar.name/projects/.
Related Articles
AI Agent Safety: The Substrate Pattern in Practice
How to implement the Substrate Pattern for AI agent safety in production. The layer below the model that decides what the agent is allowed to do.
Federated Learning Implementation: A CTO's Guide
How to implement federated learning in production. Framework selection, data partitioning, privacy mechanisms, aggregation, and deployment. The consulting practice of the Fed-Focal Loss author.
How to Take a Lovable Codebase to Production
A practical guide to taking an AI-generated codebase (Lovable, Bolt, v0, Cursor) to production. Code audit, Supabase RLS hardening, deployment, observability, and compliance.