Skip to main content
LLMServingSim

Toward unified simulation of heterogeneous and disaggregated LLM serving infrastructure

Production fidelity

Powered by a vLLM-based layerwise profiler. End-to-end TTFT, TPOT, and throughput stay close to what production serving actually delivers.

Heterogeneous HW

Mix GPU, CPU, CXL, and PIM tiers. Drop in any hardware target via the per-hardware CSV bundle format.

Disaggregated serving

First-class TP / PP / EP / DP+EP support across multiple instances, with wave-synchronized ALLTOALL on 2D ASTRA-Sim topologies.

Real workloads

Drive the simulator with ShareGPT-style traces or closed-loop agentic sessions (sub-requests + tool calls) for SWE-bench-style scenarios.

Recognition

Three publications. Three awards.

3
Publications
2
Best Paper Awards
1
Distinguished Artifact