
In a significant development for the AI hardware landscape, recent performance tests using the demanding Llama 3.1 405B model have revealed a substantial gap between leading accelerators. Early benchmarks indicate that Nvidia’s H200 Tensor Core GPU demonstrates remarkably higher throughput compared to Intel’s Gaudi 3 accelerator when running this specific, massive language model.
The Llama 3.1 405B is a state-of-the-art language model known for its immense size and computational requirements, making it a challenging benchmark for even the most powerful AI chips. The ability to efficiently process this model is a key indicator of an accelerator’s prowess in handling future large-scale AI tasks.
According to initial test data, the Nvidia H200 achieved significantly higher inference performance. Specifically, the H200’s throughput was observed to be as much as nine times greater than that of the Gaudi 3 accelerator on this particular Llama 3.1 405B benchmark. This dramatic difference in performance highlights the H200’s capability to process extremely large models with much greater speed and efficiency in this specific scenario.
The H200, which is an enhanced version of the popular H100, features significantly more high-bandwidth memory (HBM3e), providing a crucial advantage for models like Llama 3.1 405B that require vast amounts of data to be accessed quickly. This increased memory capacity and bandwidth appear to be critical factors in its superior performance on this specific workload.
Intel’s Gaudi 3 is positioned as a competitive alternative in the AI accelerator market, offering strong performance on various AI tasks. However, these early benchmark results on the Llama 3.1 405B model suggest that the H200 holds a considerable lead for processing such extraordinarily large and memory-intensive models.
It is important to note that benchmarks can vary depending on the specific model, dataset, software optimization, and system configuration used. However, these initial findings using a highly relevant and massive model like Llama 3.1 405B provide valuable insight into the current performance capabilities of these two powerful accelerators. This benchmark underscores the H200’s current dominance in handling the most demanding, cutting-edge large language models, setting a high bar for competitors.
Source: https://www.datacenterdynamics.com/en/news/nvidia-h200-outperforms-intel-gaudi-3-by-factor-of-nine-across-first-llama-31-405b-benchmark-test-exclusive/