Recent #RoCE news in the semiconductor industry

9 months ago
➀ The article discusses the evolution of large models from billion-parameter language models to trillion-parameter multimodal models, necessitating a significant boost in underlying computing capabilities for ultra-thousand-card clusters. ➁ It describes the network architecture of ByteDance, Baidu, Alibaba, and Tencent's AI clusters, highlighting the use of advanced technologies like Broadcom Tomahawk 5 chips, InfiniBand, and RoCE. ➂ The article also delves into the innovative HPN-AIPod architecture of Baidu and Alibaba's HPN7 network, showcasing their high-performance and scalable designs.
AI ClusterInfiniBandNetwork ArchitectureRoCE
10 months ago
➀ RDMA allows direct access to remote memory without kernel intervention, offering high throughput and low latency. ➁ RDMA protocols include InfiniBand, RoCE, and iWARP, each with unique advantages and deployment scenarios. ➂ Load balancing in large-scale networks is challenging due to the prevalence of large data flows, necessitating advanced techniques like PLB and SDN-based traffic engineering.
InfiniBandRDMARoCE