Is software-defined memory the answer to AI bandwidth bottlenecks?
Chicago-based Kove thinks so, and its Kove:SDM Memory Tower promises to pool memory without the need of the CXL protocol. Developed in collaboration with Viking Enterprise Solutions, Red Hat and Computacenter, Memory Tower is literally that—a fully integrated rack of memory servers equipped with Kove:SDM in combination with Red Hat Linux that the company says will be able to tap into unlimited server memory across the data center.
In an online briefing, Kove CEO John Overton said the Memory Tower is essentially plug and play, and the culmination of the company’s focus on software-defined memory (SDM) over the past decade. “We do pretty much anything you want to do with memory.”
Overton said that includes making the Memory Tower flexible and usable across long distances. “You can roll in a rack of hardware, and you can create petabytes of memory that’s shareable across the data center with no fuss, no muss. There are no code changes.”
He added that the Memory Tower is available globally with engineering resources and support provided by its partner, Computacenter, and can be slotted into a data center environment just like any Red Hat product in that it works with any x86 hardware. The latency is the same, he said, but the power consumption is lower.
The Memory Tower currently supports standard network fabrics, including command and control on Ethernet and data-plane on InfiniBand; RoCE will be available in the first quarter of 2025.
Cost-wise, Overton said the tower is the same price as adding DIMMs to an existing server, but the virtual DIMM in the tower can be logically allocated anywhere across the data center. “You can’t do that with regular server memory,” he said. “It gets stranded inside of the server.”
Bill Wright, edge AI technology evangelist at Red Hat, said Kove’s Memory Tower is an industry first. “Memory virtualization is really kind of that last mile or that last brick wall that we had a tremendously hard time breaking through.”
This virtualization capability is ideal for AI workloads, he said. “AI is a heavily memory dependent application depending on the utilization. Memory has always been a limiting factor and that’s no longer the case.”
Wright added that Red Hat sees Kove being part of its AI-related offerings going forward, along with OpenShift AI and Enterprise Linux AI. “It’s been a tremendous performance enhancement for us.”
Sherman Tang, emerging solutions specialist at Viking, said figuring out how to share memory is a decades old problem that many companies have tried to solve. “Some have tried successfully up to maybe 10 or 15 systems, but not even a rack or two racks or a cluster or data center.”
Tang said the Kove Memory Tower is easy for Viking to deploy in customer environments. “When you build a pool, you can share that anywhere that the tower is connected. Different workloads don’t have to be on the same computer. It could be spread out through different racks.”
The Kove solutions support the creation of any memory-sized server on the fly, including amounts far greater than can fit inside of the physical box—up to processor limits of 128 TiB of real memory per process, which means memory resources can be quickly redeployed across the data center. Memory recovery in the event of a DIMM failure is instantaneous thanks to the SDM allocation capabilities.
Overton said SDM is no different than any other virtualization play, except that memory represents 65-85% of the cost of a server. “Many people throw away compute in traditional computational environments to save the cost of memory.”
He said the SDM Memory Tower is faster than NVMe at less than 15 nanoseconds across the data center, and CXL is simply not required. “You don’t have to upgrade your CPU; you don’t have to upgrade your infrastructure. It works with anything. It’s only software.”
Kove has been working on this problem for nearly 15 years, Overton said. The initial approach was to investigate the physics of flash and its density to develop a better alternative to DRAM, but the company encountered all the known trade-offs that come with storage-class memory. The next step, he said, was to look at building specialized hardware. “The problem with specialized hardware is that you can’t compete at scale,” he said. “Then we figured out that this is actually solvable with software.”
Overton added that Kove’s software solution has gone “through the ringer” for many months of testing by Red Hat to prove the tower could attain local memory speeds. “Because we have the ability to hide latency, we can move stuff around behind the scenes at a much greater efficiency than anybody ever thought.”
From EETimes