🎉 Congratulations to Together AI for raising the bar with record-fast #inference on the DeepSeek-R1-0528 model, accelerated by our #NVIDIABlackwell platform—built for next-level compute, memory, and bandwidth to uplift the entire AI ecosystem. #AcceleratedComputing Learn more now. ⬇️
🚀 Together AI Sets a New Bar: Fastest Inference for DeepSeek-R1-0528 We’ve upgraded the Together Inference Engine to run on NVIDIA Blackwell GPUs—and the results speak for themselves: 📈 Highest known serverless throughput: 334 tokens/sec 🏃➡️ Fastest time to first answer token: 7.3 sec ⏱️ Lowest end-to-end response time: 9 sec Need more performance? Our Dedicated Endpoints hit 386 tokens/sec. Contact us to customize an NVIDIA HGX B200 deployment optimized for speed, quality, and cost. Our in-house stack delivers best-in-class performance and throughput—full stop. Agentic AI. Advanced reasoning. Blazing speed. Together AI is deploying Blackwell GPUs to power the next generation of real-world AI. (Read more - link in comments)