Jan 03, 2026 2 min read

The Benchmark That Sparked This Project

A simple M/M/1 queue benchmark showed a 60–80× performance gap—and convinced me to build a new simulation engine.

Why I Ran This Benchmark

Before committing to building a full simulation engine, I wanted a sanity check: how much performance am I leaving on the table by using standard simulation tools? I started with a simple example you probably remember from Simulation 101:

  • M/M/1 queue
  • Exponential interarrival
  • Exponential service
  • Run until N events
  • Measure throughput and wait times

This is the “hello world” of discrete-event simulation. If there are major performance differences here, then real-world examples will likely be even more etreme.

The Setup

I compared SimPy (Python) vs a minimual Rust prototype

  • Same random seeds
  • Same arrival/service logic
  • Same termination condition
  • No parallelism
  • No fancy data structures—yet

Hardware: M4 MacBook Pro, nothing exotic.

The Results

At 1,000,000 events, the gap was obvious:

EngineTime (ms)Speedup
SimPy~4200 ms
Rust prototype~55–70 ms60–80×

Different seeds move the exact numbers around, but the ratio holds.

Why the Gap Exists

SimPy is excellent for teaching and small models. But once you scale past trivial workloads, you hit three walls fast:

  1. Python call overhead (processes, events, generators)
  2. GC churn from object-heavy event representations
  3. No real control over memory layout or branching costs

The Rust version avoids all three:

  • No objects per event
  • Sequential, cache-friendly arrays
  • Inlineable state transitions
  • Zero-alloc hot loops

Even without tuning, it ran laps around SimPy.

What This Means

This benchmark proved two things:

  1. There’s room for a modern simulation engine that doesn’t carry decades of academic baggage.
  2. Performance headroom matters.
    When the baseline model is 60–80× faster, everything downstream becomes possible:
    • Monte Carlo replications
    • Sensitivity sweeps
    • Real-time dashboards
    • SaaS execution layers
    • Large supply-chain sims without cluster-level hardware

A fast core opens the door to an entirely different class of tooling.

Where I’m Taking It Next

This project is now heading in four directions:

  • A Rust core for maximum throughput
  • Python bindings for analysts and notebooks
  • A WASM build for interactive demos
  • A configurable process/flow DSL for rapid model definition

This first benchmark wasn’t complicated, but it was enough to know the direction is right.

If You Want to See the Code

I’ll post more detailed breakdowns soon:

  • event loop structure
  • memory layout decisions
  • RNG/seed strategy
  • planned parallelism model

For now, this is the benchmark that kicked off the whole journey.