The Benchmark That Sparked This Project

Why I Ran This Benchmark

Before committing to building a full simulation engine, I wanted a sanity check: how much performance am I leaving on the table by using standard simulation tools? I started with a simple example you probably remember from Simulation 101:

M/M/1 queue
Exponential interarrival
Exponential service
Run until N events
Measure throughput and wait times

This is the “hello world” of discrete-event simulation. If there are major performance differences here, then real-world examples will likely be even more etreme.

The Setup

I compared SimPy (Python) vs a minimual Rust prototype

Same random seeds
Same arrival/service logic
Same termination condition
No parallelism
No fancy data structures—yet

Hardware: M4 MacBook Pro, nothing exotic.

The Results

At 1,000,000 events, the gap was obvious:

Engine	Time (ms)	Speedup
SimPy	~4200 ms	1×
Rust prototype	~55–70 ms	60–80×

Different seeds move the exact numbers around, but the ratio holds.

Why the Gap Exists

SimPy is excellent for teaching and small models. But once you scale past trivial workloads, you hit three walls fast:

Python call overhead (processes, events, generators)
GC churn from object-heavy event representations
No real control over memory layout or branching costs

The Rust version avoids all three:

No objects per event
Sequential, cache-friendly arrays
Inlineable state transitions
Zero-alloc hot loops

Even without tuning, it ran laps around SimPy.

What This Means

This benchmark proved two things:

There’s room for a modern simulation engine that doesn’t carry decades of academic baggage.
Performance headroom matters.
When the baseline model is 60–80× faster, everything downstream becomes possible:
- Monte Carlo replications
- Sensitivity sweeps
- Real-time dashboards
- SaaS execution layers
- Large supply-chain sims without cluster-level hardware

A fast core opens the door to an entirely different class of tooling.

Where I’m Taking It Next

This project is now heading in four directions:

A Rust core for maximum throughput
Python bindings for analysts and notebooks
A WASM build for interactive demos
A configurable process/flow DSL for rapid model definition

This first benchmark wasn’t complicated, but it was enough to know the direction is right.

If You Want to See the Code

I’ll post more detailed breakdowns soon:

event loop structure
memory layout decisions
RNG/seed strategy
planned parallelism model

For now, this is the benchmark that kicked off the whole journey.