How I Avoided C-Linker Hell by Decoupling Rust & Python for an AI Memory Daemon

Posted on Jun 6

When I set out to build null-drift - a lightweight, local memory daemon for AI agents - my original goal was the holy grail of modern backend development: a single, blazingly fast Rust binary.

I wanted a self-contained application that could handle both machine learning inference (generating text embeddings) and the heavy lifting of a highly concurrent state machine.

It sounded great on paper. In practice, I ran straight into a brick wall.

The Problem: C-Linker Hell

To do machine learning inference in Rust, you generally rely on bindings to C/C++ libraries. I chose the ort crate (ONNX Runtime) to handle the embedding models.

However, trying to cross-compile this setup for Windows immediately resulted in absolute chaos. I encountered endless MSVC linker errors caused by conflicts between static (/MT) and dynamic (/MD) C-runtimes. Even when I managed to get it compiling, I ran into bizarre C-runtime deadlocks.

I realized I was spending more time fighting the C/C++ build toolchain than actually writing my memory daemon.

The Solution: Decoupled Microservices

I decided to stop fighting the ecosystem and instead play to the strengths of different languages. I decoupled the project into a two-container microservice architecture:

Python (FastAPI): Python is the undisputed king of ML tooling. Setting up a FastAPI service to handle sentence-transformer embeddings was trivial, and the ML toolchain "just works" across all operating systems without any linker headaches.
Rust (Axum/Tokio): Rust took over the job it was born to do: managing a highly contested, continuous 10k-dimensional state array.

By splitting the workload, the Python service acts as a pure, stateless compute node, while Rust handles the high-concurrency memory indexing and disk synchronization.

Scaling Concurrency in Rust

In the Rust daemon, the core data structure is constantly being queried and updated. To handle this, I wrapped the daemon's state in an Asynchronous Read-Write lock:

use std::sync::Arc;
use tokio::sync::RwLock;

// ...
type SharedState = Arc<RwLock<DaemonState>>;

We specifically chose tokio::sync::RwLock over standard library locks to enable high-concurrency reads while writers occasionally mutate the state. But there's a hidden security benefit to this choice as well.

Securing the State Machine (Why Tokio Locks Matter)

If you use standard library locks (std::sync::RwLock) in Rust, you have to deal with lock poisoning. If a thread panics (crashes) while holding a lock, Rust permanently "poisons" that lock to prevent other threads from reading potentially corrupted data.

In a web server, this is a massive vulnerability. If an attacker crafts a malicious request that triggers an Out-of-Memory (OOM) error or a math panic, the lock poisons, and every subsequent request to the server is permanently blocked. It's a trivial Denial of Service (DoS) attack.

tokio::sync::RwLock explicitly does not implement lock poisoning. If an Axum task crashes, Tokio drops the lock guard and safely returns it to the pool. A single bad request cannot permanently lock the daemon's memory state!

Defensive Programming for Local Daemons

You might be wondering: "It's a local daemon running on localhost. Why worry about attackers?"

Actually, there are three major reasons we built strict defenses into null-drift:

Untrusted AI Inputs: The daemon is designed for AI agents. Agents scrape the web and ingest raw, untrusted data. If an agent blindly dumps malformed data into its memory daemon, we need it to fail gracefully rather than crashing the entire pipeline.
Network Exposure: By default, the daemon binds to 0.0.0.0 as a fallback, meaning it's exposed to the local network. Anyone on your local public Wi-Fi could theoretically send it payloads.
Localhost CSRF: Even if tightly bound to 127.0.0.1, a malicious website you visit could use JavaScript to execute Cross-Site Request Forgery (CSRF), silently sending POST requests to localhost:3000.

To counter this, we implemented multiple layers of defense before a request ever reaches the lock:

// 1. Strict dimensionality validation before linear algebra
if payload.embedding.len() != 384 {
    return Err(DaemonError::InvalidDimension);
}

// 2. Strict body limits to prevent memory exhaustion
let app = Router::new()
    // ... routes
    .layer(DefaultBodyLimit::max(64 * 1024));

// 3. Bounded deserialization for state restoration
let bincode_opts = bincode::DefaultOptions::new().with_limit(50 * 1024 * 1024);
let cog_state: CognitiveState = bincode_opts.deserialize(&bytes)?;

We use bincode for extremely fast, direct-to-disk binary serialization of our massive 10k-dimensional state arrays. But by wrapping it in with_limit(), we ensure a corrupted state file can't blow up system RAM upon a restart.

Wrapping Up

Building null-drift was a great lesson in choosing the right tool for the job. By letting Python handle the ML friction and Rust handle the concurrent state, the architecture became drastically simpler to deploy, compile, and maintain.

If you want to join the broader discussion, see the original visual phase-space hook, or share this project with other local-AI builders, check out the launch thread on X:

If you're interested in checking out the lock-free implementation, the multi-threaded state architecture, or the Docker setup, you can find the repository here:

🔗 null-drift on GitHub

Let me know what you think in the comments!

Top comments (0)

For further actions, you may consider blocking this person and/or reporting abuse