Intel® Gaudi 3 AI Accelerator
Big for Gen AI, Even Bigger for ROI.
Intel® Gaudi 3 AI Accelerator
Introducing the Intel® Gaudi® 3 AI Accelerator
With performance, scalability, and efficiency that give more choice to more customers, Intel® Gaudi® 3 accelerators help Enterprises unlock insights, innovations, and income.
With the growing demand for generative AI compute, has come increasing demand for solution alternatives that give customers choice. The Intel® Gaudi® 3 AI accelerator is designed to deliver choice with:
Price-Performance Efficiency
Competitive price performance so enterprise AI teams can train more, deploy more, and spend less.
Massive Scalability
Flexible networking based on open, industry-standard Ethernet that efficiently scales systems up and out to address the compute needs of LLMs.
Easy-to-Use Development Platform
Efficient model migration & development on popular open frameworks and software that saves time and preserves software investment.
Support by Intel® Tiber™ Developer Cloud
Discover the performance and ease of use with the Intel® Gaudi® accelerator on the Intel® Tiber™ Developer Cloud.
Efficiency, Performance, and Scale for Data Center AI Workloads
Intel® Gaudi® 3 AI accelerators support state-of-the-art generative AI and LLMs for the data center and pair with Xeon® Scalable Processors, the host CPU of choice for leading AI systems, to deliver enterprise performance and reliability.
Architected at Inception for Gen AI Training and Inference
The Intel® Gaudi® 3 accelerator challenges the industry’s legacy performance leader with speed and efficiency born from its AI compute design. The Intel® Gaudi® 3 accelerator architecture features heterogeneous compute engines—eight matrix multiplication engines (MME), 64 fully programmable Tensor Processor Cores (TPCs) and supports popular data types required for deep learning: FP32, TF32, BF16, FP16 & FP8. For details on how the Intel® Gaudi® 3 accelerator architecture delivers AI compute performance more efficiently.
Scale Large Systems, Scale Great Performance
Great networking performance starts at the processor where Intel® Gaudi® 3 accelerator integrates 24200 Gigabit Ethernet ports on chip, enabling more efficient scale up in the server and massive scale out capacity for cluster-scale systems that support blazing-fast training and inference of models—large and small. Near-linear scalability of Intel® Gaudi® networking preserves the cost-performance advantage, whether you’re scaling out four nodes or four hundred. For more information about scaling Intel® Gaudi® accelerators.
Bringing the Productivity and Freedom to Inspire AI Innovation
Intel® Gaudi® Software
Intel® Gaudi® software eases development with integration of the PyTorch framework, the foundation of the majority of Gen AI and LLM development. Code migration on PyTorch from Nvidia GPUs to Intel® Gaudi® accelerators requires only 3 to 5 lines of code. Intel® Gaudi® software on the Optimum Habana Library also gives developers easy access to thousands of popular gen transformer and diffusion models on the Hugging Face hub. For more information developing on Intel® Gaudi® software.
An Introduction to Game-Changing Intel® Gaudi® 3 AI Accelerators
To Address Your Enterprise’s Specific Needs, Intel® Gaudi® 3 Accelerator Provides These Hardware Options
Every enterprise has different AI compute requirements with many considerations—desired system performance, scale, power, footprint, and more.
Get the latest on AI trends and technologies
Subscribe to stay connected with Intel
Notices and Disclaimers1 2 3 4
Product and Performance Information
Gaudi 3 training vs. H100; average performance projected across multiple models, multiple configurations: Llama2 7B & 13B, GPT-3 175B.
Gaudi 3 inference vs. H200; average performance projected across multiple models, multiple configurations: Llama2 7B & 70B, Falcon 180B.
Gaudi 3 inference vs. H100; average performance projected across multiple models, multiple configurations: Llama2 7B & 70B, Falcon 180B.
Gaudi 3 inference power efficiency vs. H100; average performance projected across multiple models, multiple configurations: Llama2 7B & 70B, Flacon 180B H100 and H200 data sources: https://developer.nvidia.com/deep-learning-performance-training-inference/ai-inference and https://developer.nvidia.com/deep-learning-performance-training-inference/training Intel results obtained in April, 2024. Results may vary.