THE LATEST NEWS
Broadcom Expands AI Ethernet Offerings

Broadcom Inc. continues to advance its Co-Packaged Optics (CPO) offerings to meet the demands of artificial intelligence (AI) scale out but also sees Ethernet technologies as complementary and just as critical. 

In the first half of October, the company shipped its Tomahawk 6 – Davisson (TH6-Davisson) CPO Ethernet switch as well as its Thor Ultra, what Broadcom claims is the industry’s first 800G AI Ethernet Network Interface Card (NIC). Both are aimed speeding up and scaling AI networking. 

TH6-Davisson, Broadcom’s third-generation CPO Ethernet switch, delivers 102.4 Terabits per second of optically enabled switching capacity. (Source: Broadcom)

In a briefing with EE Times, Manish Mehta, VP of marketing and operations for Broadcom’s optical systems division, said the TH6-Davisson, Broadcom’s third-generation CPO Ethernet switch, doubles the bandwidth of any CPO switch available today, delivering 102.4 Terabits per second of optically enabled switching capacity. 

But performance is not the only valuable metric AI data centers – the TH6-Davisson also makes advances in power efficiency and traffic stability, which Mehta said is necessary to support AI cluster scale-up and scale-out. “The optical interconnect space is exploding for AI networks,” he said. “It’s probably had an order of magnitude growth in volumes compared to when devices were just needed in the front end of the cloud network.” 

The primary reason for the massive uptick in AI networking is that back-end GPUs use about 10 times the optical bandwidth scale-out as the front-end CPUs, Mehta said, which saw their own uptick in adoption begin in 2010 due to increasing use of cloud networking. “The first layer tends to be a copper link from the CPU server to the top of rack switch and then you go to optical connectivity to the leaf or spine layers,” he said. 

AI networking involves connecting a lot of GPUs to a switch or network of switches and scale out networking demands reach that can only be achieved with optical, Mehta said. 

When Broadcom introduced its CPO portfolio in 2021, he said the company was focused on reducing power consumption and only beginning to densify silicon photonics such that high-density optical engines could be placed on a common substrate with an ASIC. “We were focused on front-end compute.” Mehta said. 

Earlier this year, Broadcom released its third-generation 200G/lane CPO product line. The company’s optical systems division has been built in part through acquisition, including the fiber optic product division of Avago, which was building optical transceivers based on multimode lasers. Broadcom has been investing in its CPO platform for the past fiver years and leveraged several decades of laser technology development. 

The migration to AI networking and high volume of volume requirements of optics are driving a high velocity of advancement of CPO technology, he said. “We’re see that the power consumption continues to go up if you use traditional optics. CPO can help.” 

Mehta cited data presented by Facebook parent company Meta last week that it had achieved more than one million cumulative 400 Gb/s equivalent port device hours without any data drops using Broadcom’s second generation Tomahawk 5 Bailley, released in March 2024. He said this indicates its readiness for large-scale deployment in data centers, while also noting a 65% reduction in optics power consumption. “There’s a significant improvement in link performance. If you can reduce that link flap frequency, end users can improve GPU utilization for monetization.” 

TH6-Davisson migrates lane speed from 100GB per lane to 200GB per lane, Mehta said, while switch bandwidth grows from 51.2T to 102.4T. “We’re already in development on Gen 4, which will increase the line rate to 400GB per lane.”  

He said Broadcom is using the same platform for the backend manufacturing as previous generations – Tomahawk 6 has 512 logical ports, allowing to talk to 512 endpoints to provide the connectivity demanded by customers, and improve link performance and GPU cluster efficiency.   

Mehta said Broadcom’s focus on advancing CPO is delivering Ethernet-based CPO solutions and leveraging Ethernet for AI because it’s open, scalable, and power efficient. 

Separate but related was Broadcom’s announcement of Thor Ultra, now sampling, the company’s 800G AI Ethernet NIC, which can interconnect hundreds of thousands of XPUs. The new NIC adopts the Ultra Ethernet Consortium (UEC) specification to provide the ability to scale AI workloads using high performance advanced RDMA capabilities. 

In a briefing with EE Times, Hasan Siraj, Broadcom’s head of software products and ecosystem, said traditional RDMA lacks multipathing, out-of-order packet delivery, selective retransmit and scalable congestion control. By using the UEC specification, Thor Ultra supports packet-level multipathing for efficient load balancing and out-of-order packet delivery directly to XPU memory for maximizing fabric utilization. 

Facebook parent company Meta has achieved more than one million cumulative 400 Gb/s equivalent port device hours without any data drops using Broadcom’s second generation Tomahawk 5 Bailley (Source: Broadcom) 

Siraj said the thought process when the UEC was formed was that AI clusters become bigger, RDMA needed to be modernized. “It cannot do out-of-order packet delivery. If packets arrive out of order at the destination, they are dropped.” 

The lack of a selective retransmit capabilities and challenges tuning the protocol for congestion control further drove the need to enhance RDMA to allow to support AI scale out effectively, Siraj said.  

Using UEC also enables to Thor Ultra to support selective retransmission for efficient data transfer and provide programmable receiver-based and sender-based congestion control algorithms. And by delivering these advanced RDMA capabilities in an open ecosystem, customers can connect with any XPUs, optics, or switches they want, Siraj said, and reduce their dependency on proprietary solutions. “We wanted to make sure the existing switches that are out there that are the equivalent of Tomahawk 5 can be supported with this NIC and the future generations like Tomahawk 6. 

He said there are 2x400GB variations of Ethernet NICs available, but this is the first that can support a single 800GB flow. 

Siraj said Thor Ultra is a critical piece of the puzzle as Broadcom builds out its AI infrastructure portfolio and address the limits of traditional RDMA, which is the baseline protocol used in AI and high performance computing environments. “It was not built to get the best performance at scale.” 

From EETimes

Back
Marvell Eyeing Connectivity as the Next Big Thing in AI
At this year’s Marvell Industry Analyst Day, held on Dec. 9, Marvell Technology’s President and chief operating officer Chris K...
More info
Race to Find the Next Nvidia in Quantum Computing
Qilimanjaro CEO Marta Estarellas talks about the current boom of QC and its future.The global quantum computing (QC) sector is ex...
More info
Race to Find the Next Nvidia in Quantum Computing
Qilimanjaro CEO Marta Estarellas talks about the current boom of QC and its future.The global quantum computing (QC) sector is ex...
More info
0.0978s