“With GB200 NVL72 and Together AI’s custom optimizations, we are exceeding customer expectations for large-scale inference workloads for MoE models like DeepSeek-V3,” said Vipul Ved Prakash, cofounder and CEO of Together AI. “The performance gains come from NVIDIA’s full-stack optimizations coupled with Together AI Inference breakthroughs across kernels, runtime engine and speculative decoding.”
This performance advantage is evident across other frontier models.
Kimi K2 Thinking, the most intelligent open-source model, serves as another proof point, achieving 10x better generational performance when deployed on GB200 NVL72.




