➀ A 30 billion parameter LLM is demonstrated with a prototype inference device equipped with 16 IBM AIU NorthPole processors, achieving a system throughput of 28,356 tokens/second and a latency below 1 ms/token; ➁ NorthPole offers 72.7 times better energy efficiency and lower latency compared to GPUs at the lowest GPU delay; ➂ NorthPole architecture is inspired by the brain, optimized for AI inference, and demonstrates superior performance in LLM推理.