➀ AMD is facing challenges in catching up with NCCL and needs exclusive access to a persistent cluster of at least 1,024 MI300 class GPUs.
➁ AMD's RCCL library is a fork of Nvidia's NCCL and requires significant engineering hours to sync with Nvidia's major refactor.
➂ AMD is planning to rewrite RCCL from scratch to stop being a fork of NCCL.
➃ NVIDIA's NCCL continues to advance with new features and performance improvements.
➄ AMD has made progress in software infrastructure but is falling behind in ML libraries.
➅ AMD lacks support for features like disaggregated prefill and NVMe KV Cache Tiering.
➆ Recommendations are made to both AMD and NVIDIA for improving their competitive positions.