Production fidelity
Powered by a vLLM-based layerwise profiler. End-to-end TTFT, TPOT, and throughput stay close to what production serving actually delivers.
Heterogeneous HW
Mix GPU, CPU, CXL, and PIM tiers. Drop in any hardware target via the per-hardware CSV bundle format.
Disaggregated serving
First-class TP / PP / EP / DP+EP support across multiple instances, with wave-synchronized ALLTOALL on 2D ASTRA-Sim topologies.
Real workloads
Drive the simulator with ShareGPT-style traces or closed-loop agentic sessions (sub-requests + tool calls) for SWE-bench-style scenarios.
Recognition
Three publications. Three awards.
3
Publications
2
Best Paper Awards
1
Distinguished Artifact
- IISWC 2024Best Paper AwardDistinguished Artifact Award
- ISPASS 2026Best Paper Award
