semianalysis

Author page description

May 27

  • AMD vs NVIDIA Inference Benchmark: Who Wins on Performance and Cost per Million Tokens?

    ➀ The article compares the performance and cost efficiency of AMD and NVIDIA GPUs for various AI tasks such as chat, translation, reasoning, and summarization.

    ➁ It highlights the MI325X and MI300X as cost-effective options for Llama3 70B chat and translation tasks.

    ➂ The analysis reveals that AMD GPUs are less cost-effective in rental scenarios due to limited availability and higher prices.

    ➃ The article discusses the need for better inference benchmarks and explores the features and capabilities of NVIDIA's Dynamo framework.

    AMDGPUNVIDIAbenchmarkperformance

April 29

  • AMD's New Sense of Urgency: MI450X, Chance to Beat NVIDIA, and NVIDIA's New Moat

    ➀ AMD is facing challenges in catching up with NCCL and needs exclusive access to a persistent cluster of at least 1,024 MI300 class GPUs.

    ➁ AMD's RCCL library is a fork of Nvidia's NCCL and requires significant engineering hours to sync with Nvidia's major refactor.

    ➂ AMD is planning to rewrite RCCL from scratch to stop being a fork of NCCL.

    ➃ NVIDIA's NCCL continues to advance with new features and performance improvements.

    ➄ AMD has made progress in software infrastructure but is falling behind in ML libraries.

    ➅ AMD lacks support for features like disaggregated prefill and NVMe KV Cache Tiering.

    ➆ Recommendations are made to both AMD and NVIDIA for improving their competitive positions.

    AIAMDGPUNVIDIA

April 28

April 22

  • Huawei AI CloudMatrix 384 – China's Answer to Nvidia GB200 NVL72

    ➀ Huawei has unveiled the CloudMatrix 384, an AI accelerator and rack-scale architecture that competes with Nvidia's GB200 NVL72.

    ➁ The system uses 384 Ascend 910C chips, achieving impressive performance despite each chip being only one-third the performance of an Nvidia Blackwell GPU.

    ➂ The CloudMatrix 384 offers 300 PFLOPs of dense BF16 compute, almost double that of the GB200 NVL72, with over 3.6x aggregate memory capacity and 2.1x more memory bandwidth.

    AIHuaweiNVIDIA

October 28

  • Fab Whack-a-Mole: Chinese Companies Evasion of Export Controls

    ➀ Current Western export controls have slowed China's progress in advanced logic, but are not perfect or infallible.

    ➁ Loopholes in restrictions include offshore manufacturing, end-use workarounds, and renaming/reclassifying technologies.

    ➂ Huawei's fab network poses a national security concern, exploiting sanctions and advancing domestic semiconductor supply chains.

    ➃ WFE suppliers' lobbying for relaxed controls is refuted by strong business performance and long-term market share impacts from domestic Chinese firms.

    ➄ Suggestions for improving export controls include expanding the entity list, aligning ally restrictions, tightening supply chain restrictions, and improving enforcement.

    Export Controlsnational security