Home > Products > NVIDIA and xAI unveil Colossus, world’s largest AI supercomputer

Artificial Intelligence

NVIDIA and xAI unveil Colossus, world’s largest AI supercomputer

30th October 2024

Paige West

0 0

In a landmark development, NVIDIA and xAI revealed that the Colossus supercomputer cluster, composed of 100,000 NVIDIA Hopper GPUs, has been deployed in Memphis, Tennessee.

Built on NVIDIA’s Spectrum-X Ethernet networking platform, the supercomputer represented a significant leap in scalable AI, tailored for the demands of multi-tenant, hyperscale AI operations and standards-based Ethernet.

Designed to support xAI’s Grok language models, the Colossus supercomputer powers AI-driven chatbots available to X Premium subscribers. The system achieved this unprecedented scale through Spectrum-X’s advanced capabilities, particularly Remote Direct Memory Access (RDMA) support, enabling high-performance networking without the flow collisions that impact conventional Ethernet systems. Colossus marked the highest performance rate for any AI supercomputer, sustaining 95% data throughput across its network fabric with zero latency degradation or packet loss.

In an impressive feat of engineering and collaboration, NVIDIA and xAI constructed the Colossus facility in just 122 days – a fraction of the time typically required for such infrastructure, which often takes years to complete. Only 19 days passed from the installation of the initial racks to the initiation of the first training operations.

“AI is becoming mission-critical and requires increased performance, security, scalability, and cost-efficiency,” stated Gilad Shainer, NVIDIA’s Senior Vice President of Networking. “The NVIDIA Spectrum-X Ethernet networking platform is designed to provide innovators such as xAI with faster processing, analysis, and execution of AI workloads, accelerating the development, deployment, and time-to-market of AI solutions.”

NVIDIA’s Spectrum-X networking architecture is anchored by its Spectrum SN5600 Ethernet switch, capable of port speeds of up to 800Gb/s, and built on the Spectrum-4 ASIC. This switch, coupled with NVIDIA BlueField-3 SuperNICs, allowed xAI to achieve the colossal bandwidth required for the system while minimising latency. The Colossus network platform enabled adaptive routing with NVIDIA Direct Data Placement technology, congestion control, and enhanced visibility, making it uniquely suited to generative AI Clouds and expansive enterprise environments.

Elon Musk remarked on X: “Colossus is the most powerful training system in the world. Nice work by the xAI team, NVIDIA, and our many partners and suppliers.”

An xAI spokesperson added: “With NVIDIA’s Hopper GPUs and Spectrum-X, we’re pioneering large-scale AI model training with unmatched acceleration and efficiency.”

As Colossus now begins an ambitious expansion to 200,000 NVIDIA Hopper GPUs, the system demonstrates the power of accelerated AI ‘factories’ grounded in Ethernet technology, setting new standards for high-performance, scalable AI infrastructure.

Product Spotlight

TBF10SL-4PS-B

ITT Interconnect Solutions

Circular Connector Standard 5/2 Female Sockets/Male Pins Panel Mount CA/5015 Co...

SKU:	1003-TBF10SL-4PS-B-ND
Stock:	50
Cost:	$45.59

Buy Now Learn More

CAA572C0G3A663J640LJ

TDK Corporation

Speciality Ceramic Capacitors Inline MEGA Cap,2220,C0G,1000V,66nF,+/-5%,6.4mm AE...

SKU:
Stock:	1037
Cost:	$9.16

Buy Now Learn More

RA1113112R

E-Switch, Inc.

E-Switch / RS PRO RA1113112R Rocker Switch, SPST, OFF-ON, 10A, 125V AC, QC 0.187...

SKU:
Stock:	5586
Cost:	$0.00

Buy Now Learn More

R30-3002002

Harwin

20.00mm M3 Metric M/F Threaded Hex Brass Spacer/Pillar Hardware - Spacer (Stand...

SKU:
Stock:	6545
Cost:	$0.76

Buy Now Learn More

NRF54L15-QFAA-R

Nordic Semiconductor

RF System on a Chip - SoC Ultra-low power Bluetooth Multiprotocol 5.4 SoC System...

SKU:	4823-NRF54L15-QFAA-RTR-ND
Stock:	0
Cost:	$2.39

Buy Now Learn More

STDRIVEG611Q

STMicroelectronics

Gate Drivers High voltage and high-speed half-bridge gate driver for GaN power s...

SKU:
Stock:	0
Cost:	$2.63

Buy Now Learn More

TBF10SL-4PS-B

ITT Interconnect Solutions

Circular Connector Standard 5/2 Female Sockets/Male Pins Panel Mount CA/5015 Co...

SKU:	1003-TBF10SL-4PS-B-ND
Stock:	50
Cost:	$45.59

Buy Now Learn More

CAA572C0G3A663J640LJ

TDK Corporation

Speciality Ceramic Capacitors Inline MEGA Cap,2220,C0G,1000V,66nF,+/-5%,6.4mm AE...

SKU:
Stock:	1037
Cost:	$9.16

Buy Now Learn More

RA1113112R

E-Switch, Inc.

E-Switch / RS PRO RA1113112R Rocker Switch, SPST, OFF-ON, 10A, 125V AC, QC 0.187...

SKU:
Stock:	5586
Cost:	$0.00

Buy Now Learn More

R30-3002002

Harwin

20.00mm M3 Metric M/F Threaded Hex Brass Spacer/Pillar Hardware - Spacer (Stand...

SKU:
Stock:	6545
Cost:	$0.76

Buy Now Learn More

NRF54L15-QFAA-R

Nordic Semiconductor

RF System on a Chip - SoC Ultra-low power Bluetooth Multiprotocol 5.4 SoC System...

SKU:	4823-NRF54L15-QFAA-RTR-ND
Stock:	0
Cost:	$2.39

Buy Now Learn More

STDRIVEG611Q

STMicroelectronics

Gate Drivers High voltage and high-speed half-bridge gate driver for GaN power s...

SKU:
Stock:	0
Cost:	$2.63

Buy Now Learn More