LLMServingSim

Powered by a vLLM-based layerwise profiler. End-to-end TTFT, TPOT, and throughput stay close to what production serving actually delivers.

Mix GPU, CPU, CXL, and PIM tiers. Drop in any hardware target via the per-hardware CSV bundle format.

First-class TP / PP / EP / DP+EP support across multiple instances, with wave-synchronized ALLTOALL on 2D ASTRA-Sim topologies.

Drive the simulator with ShareGPT-style traces or closed-loop agentic sessions (sub-requests + tool calls) for SWE-bench-style scenarios.

Recognition

Three publications. Three awards.

Publications

Best Paper Awards

Distinguished Artifact