The Benchmark That Sparked This Project
A simple M/M/1 queue benchmark showed a 60–80× performance gap—and convinced me to build a new simulation engine.
Why I Ran This Benchmark
Before committing to building a full simulation engine, I wanted a sanity check: how much performance am I leaving on the table by using standard simulation tools? I started with a simple example you probably remember from Simulation 101:
- M/M/1 queue
- Exponential interarrival
- Exponential service
- Run until N events
- Measure throughput and wait times
This is the “hello world” of discrete-event simulation. If there are major performance differences here, then real-world examples will likely be even more etreme.
The Setup
I compared SimPy (Python) vs a minimual Rust prototype
- Same random seeds
- Same arrival/service logic
- Same termination condition
- No parallelism
- No fancy data structures—yet
Hardware: M4 MacBook Pro, nothing exotic.
The Results
At 1,000,000 events, the gap was obvious:
| Engine | Time (ms) | Speedup |
|---|---|---|
| SimPy | ~4200 ms | 1× |
| Rust prototype | ~55–70 ms | 60–80× |
Different seeds move the exact numbers around, but the ratio holds.
Why the Gap Exists
SimPy is excellent for teaching and small models. But once you scale past trivial workloads, you hit three walls fast:
- Python call overhead (processes, events, generators)
- GC churn from object-heavy event representations
- No real control over memory layout or branching costs
The Rust version avoids all three:
- No objects per event
- Sequential, cache-friendly arrays
- Inlineable state transitions
- Zero-alloc hot loops
Even without tuning, it ran laps around SimPy.
What This Means
This benchmark proved two things:
- There’s room for a modern simulation engine that doesn’t carry decades of academic baggage.
- Performance headroom matters.
When the baseline model is 60–80× faster, everything downstream becomes possible:- Monte Carlo replications
- Sensitivity sweeps
- Real-time dashboards
- SaaS execution layers
- Large supply-chain sims without cluster-level hardware
A fast core opens the door to an entirely different class of tooling.
Where I’m Taking It Next
This project is now heading in four directions:
- A Rust core for maximum throughput
- Python bindings for analysts and notebooks
- A WASM build for interactive demos
- A configurable process/flow DSL for rapid model definition
This first benchmark wasn’t complicated, but it was enough to know the direction is right.
If You Want to See the Code
I’ll post more detailed breakdowns soon:
- event loop structure
- memory layout decisions
- RNG/seed strategy
- planned parallelism model
For now, this is the benchmark that kicked off the whole journey.