SemiVoice

semianalysis

Author page description

May 27

AMD vs NVIDIA Inference Benchmark: Who Wins on Performance and Cost per Million Tokens?
➀ The article compares the performance and cost efficiency of AMD and NVIDIA GPUs for various AI tasks such as chat, translation, reasoning, and summarization.
➁ It highlights the MI325X and MI300X as cost-effective options for Llama3 70B chat and translation tasks.
➂ The analysis reveals that AMD GPUs are less cost-effective in rental scenarios due to limited availability and higher prices.
➃ The article discusses the need for better inference benchmarks and explores the features and capabilities of NVIDIA's Dynamo framework.
AMD GPU NVIDIA benchmark performance

April 29

AMD's New Sense of Urgency: MI450X, Chance to Beat NVIDIA, and NVIDIA's New Moat
➀ AMD is facing challenges in catching up with NCCL and needs exclusive access to a persistent cluster of at least 1,024 MI300 class GPUs.
➁ AMD's RCCL library is a fork of Nvidia's NCCL and requires significant engineering hours to sync with Nvidia's major refactor.
➂ AMD is planning to rewrite RCCL from scratch to stop being a fork of NCCL.
➃ NVIDIA's NCCL continues to advance with new features and performance improvements.
➄ AMD has made progress in software infrastructure but is falling behind in ML libraries.
➅ AMD lacks support for features like disaggregated prefill and NVMe KV Cache Tiering.
➆ Recommendations are made to both AMD and NVIDIA for improving their competitive positions.
AI AMD GPU NVIDIA

April 28

Microsoft’s Datacenter Freeze: 1.5GW Self-Build Slowdown & Lease Cancellation Misconceptions
➀ The market has focused on '2GW of lease cancellations', but this only covers non-binding LOIs, not firm contracts.
➁ Microsoft has ~5GW of pre-leased capacity under binding contracts set to start operations between 2025 and 2028.
➂ Microsoft walked away from significantly more than 2GW of non-binding contracts over the last two quarters.
Microsoft

April 22

Huawei AI CloudMatrix 384 – China's Answer to Nvidia GB200 NVL72
➀ Huawei has unveiled the CloudMatrix 384, an AI accelerator and rack-scale architecture that competes with Nvidia's GB200 NVL72.
➁ The system uses 384 Ascend 910C chips, achieving impressive performance despite each chip being only one-third the performance of an Nvidia Blackwell GPU.
➂ The CloudMatrix 384 offers 300 PFLOPs of dense BF16 compute, almost double that of the GB200 NVL72, with over 3.6x aggregate memory capacity and 2.1x more memory bandwidth.
AI Huawei NVIDIA

October 28

Fab Whack-a-Mole: Chinese Companies Evasion of Export Controls
➀ Current Western export controls have slowed China's progress in advanced logic, but are not perfect or infallible.
➁ Loopholes in restrictions include offshore manufacturing, end-use workarounds, and renaming/reclassifying technologies.
➂ Huawei's fab network poses a national security concern, exploiting sanctions and advancing domestic semiconductor supply chains.
➃ WFE suppliers' lobbying for relaxed controls is refuted by strong business performance and long-term market share impacts from domestic Chinese firms.
➄ Suggestions for improving export controls include expanding the entity list, aligning ally restrictions, tightening supply chain restrictions, and improving enforcement.
Export Controls national security