➀ A 30 billion parameter LLM is demonstrated with a prototype inference device equipped with 16 IBM AIU NorthPole processors, achieving a system throughput of 28,356 tokens/second and a latency below 1 ms/token; ➁ NorthPole offers 72.7 times better energy efficiency and lower latency compared to GPUs at the lowest GPU delay; ➂ NorthPole architecture is inspired by the brain, optimized for AI inference, and demonstrates superior performance in LLM推理.
Related Articles
- The Double-Edged Sword of AI Processors: Batch Sizes, Token Rates, and the Hardware Hurdles in Large Language Model Processing6 months ago
- Dell PowerEdge XE9712: NVIDIA GB200 NVL72-based AI GPU cluster for LLM training, inference11 months ago
- SAMA P1200 Platinum power supply review1 day ago
- MLPerf Client 1.0 AI benchmark released — new testing toolkit sports a GUI, covers more models and tasks, and supports more hardware acceleration pathsabout 1 month ago
- Nvidia confirms end of Game Ready driver support for Maxwell and Pascal GPUs — affected products will get optimized drivers through October 2025about 1 month ago
- Lack of PCIe bandwidth can nerf RTX 5090 by up to 25% in content creation workloads — Puget data confirms performance hit when using older generations and fewer lanes2 months ago
- Hilariously unfortunate Windows user accidentally ejects graphics card like a memory stick — breaks PC for over an hour out of curiosity2 months ago
- Nvidia's 16GB RTX 5060 Ti reportedly 16x more popular than its 8GB variant — German retailer figures suggest customers are steering clear of lower spec model2 months ago
- InPlay Oil: High Yield, Hidden Value In Canadian Energy2 months ago
- Industry news live: the latest news from Nvidia, Intel, and AMD2 months ago