➀ A 30 billion parameter LLM is demonstrated with a prototype inference device equipped with 16 IBM AIU NorthPole processors, achieving a system throughput of 28,356 tokens/second and a latency below 1 ms/token; ➁ NorthPole offers 72.7 times better energy efficiency and lower latency compared to GPUs at the lowest GPU delay; ➂ NorthPole architecture is inspired by the brain, optimized for AI inference, and demonstrates superior performance in LLM推理.
Related Articles
- The Double-Edged Sword of AI Processors: Batch Sizes, Token Rates, and the Hardware Hurdles in Large Language Model Processing5 months ago
- Dell PowerEdge XE9712: NVIDIA GB200 NVL72-based AI GPU cluster for LLM training, inference9 months ago
- This RTX 5090 is cheaper than anything we saw on Prime Day and isn't even discounted — grab Zotac's triple fan beast for just $2,4993 days ago
- Game developers urge Nvidia RTX 30 and 40 series owners rollback to December 2024 driver after recent RTX 50-centric release issues4 months ago
- ‘high-end’ Intel Battlemage GPU was reportedly cancelled in late 20244 months ago
- The NVIDIA Rubin NVL576 Kyber Midplane is Huge4 months ago
- Blower-style RTX 4090 48GB teardown reveals dual-sided memory configuration — PCB design echoes the RTX 30904 months ago
- Contactless Timing for Paralympic Swimming4 months ago
- A Lighter, Smarter Magnetoreceptive Electronic Skin4 months ago
- Nvidia's Jesnen Huang expects GAA-based technologies to bring a 20% performance uplift4 months ago