➀ A 30 billion parameter LLM is demonstrated with a prototype inference device equipped with 16 IBM AIU NorthPole processors, achieving a system throughput of 28,356 tokens/second and a latency below 1 ms/token; ➁ NorthPole offers 72.7 times better energy efficiency and lower latency compared to GPUs at the lowest GPU delay; ➂ NorthPole architecture is inspired by the brain, optimized for AI inference, and demonstrates superior performance in LLM推理.
Related Articles
- The Double-Edged Sword of AI Processors: Batch Sizes, Token Rates, and the Hardware Hurdles in Large Language Model Processing3 months ago
- Dell PowerEdge XE9712: NVIDIA GB200 NVL72-based AI GPU cluster for LLM training, inference8 months ago
- Nvidia reportedly developing new AI chip for China that meets export controls – B30 could include NVLink for creation of high-performance clustersabout 15 hours ago
- Intel says you can download more FPS for Lunar Lake — new driver promises up to 10% higher average frame rates, 25% improvement in 99th percentiles3 days ago
- Game developers urge Nvidia RTX 30 and 40 series owners rollback to December 2024 driver after recent RTX 50-centric release issues2 months ago
- ‘high-end’ Intel Battlemage GPU was reportedly cancelled in late 20242 months ago
- The NVIDIA Rubin NVL576 Kyber Midplane is Huge2 months ago
- Blower-style RTX 4090 48GB teardown reveals dual-sided memory configuration — PCB design echoes the RTX 30902 months ago
- Contactless Timing for Paralympic Swimming2 months ago
- A Lighter, Smarter Magnetoreceptive Electronic Skin2 months ago