➀ A 30 billion parameter LLM is demonstrated with a prototype inference device equipped with 16 IBM AIU NorthPole processors, achieving a system throughput of 28,356 tokens/second and a latency below 1 ms/token; ➁ NorthPole offers 72.7 times better energy efficiency and lower latency compared to GPUs at the lowest GPU delay; ➂ NorthPole architecture is inspired by the brain, optimized for AI inference, and demonstrates superior performance in LLM推理.
Related Articles
- The Double-Edged Sword of AI Processors: Batch Sizes, Token Rates, and the Hardware Hurdles in Large Language Model Processing8 months ago
 - Dell PowerEdge XE9712: NVIDIA GB200 NVL72-based AI GPU cluster for LLM training, inferenceabout 1 year ago
 - Intrepid modder builds Frame Warp demo from Nvidia Reflex 2 binaries — tech remains mysteriously shelved despite greatly reducing latency11 days ago
 - Newegg has slashed $1,900 from this AMD Ryzen 7 9800X3D PC with RX 9070 XT — score an excellent Windows 11 gaming PC for nearly half the price12 days ago
 - NVIDIA Upgrades RTX Pro 5000 Blackwell GPU With Monstrous 72GB VRAM13 days ago
 - This 'gaming PC' is actually a Bluetooth speaker — replica pumps out the jams with faux dual GPUs, liquid cooling, and RGB20 days ago
 - Bride surprises new husband with an RTX 5090 on wedding day — Chinese number slang reveals surprise gift23 days ago
 - Lucky PC builder snipes RTX 5090 for just $1,119 — humbles proud shopper who scored one for $1,399 just two days earlier26 days ago
 - Moor threading: China's Best GPU Aspirant26 days ago
 - Asus reveals how $500,000 ROG Astral RTX 5090D was made — world's most expensive GPU is hewn from 5KG of pure gold27 days ago