
Santa Clara, Calif. – SiFive, a leader in RISC-V computing, launches its second-generation Intelligence family of processors, a major advancement in accelerating artificial intelligence workloads across a broad spectrum of applications.
The new lineup features five RISC-V-based products, including the entirely new X100 series (X160 Gen 2 and X180 Gen 2), alongside upgraded X280 Gen 2, X390 Gen 2, and XM Gen 2 offerings for high-performance edge and datacenter.
With these new series, SiFive aims to capitalize on the rapidly growing demand for AI solutions, which Deloitte predicts will see at least 20% growth in every tech environment, including a remarkable 78% surge in AI edge computing.

The new Intelligence family is designed to enhance scalar, vector, and, with the XM series, matrix processing capabilities tailored for modern AI workloads. Patrick Little, CEO of SiFive, emphasized the pivotal role of AI, stating, “AI is catalyzing the next era of the RISC-V revolution.”
The company is already seeing strong adoption, with two Tier 1 U.S. semiconductor companies having licensed the new X100 series even before its public announcement. These early adopters are deploying the X100 IP for two distinct use cases: one involves pairing the SiFive scalar vector core with a matrix engine, acting as an accelerator control unit, while the other utilizes the vector engine as a standalone AI accelerator.
SiFive’s second-generation products address critical challenges in AI deployment, particularly in memory management and nonlinear function acceleration. A core innovation across the X-Series IPs is their ability to function as an Accelerator Control Unit (ACU).
It allows SiFive cores to provide essential control and assist functions for a customer’s custom accelerator engine through specialized coprocessor interfaces, the SiFive Scalar Coprocessor Interface (SSCI) and Vector Coprocessor Interface eXtension (VCIX).
The architecture empowers customers to concentrate on data processing innovations at the platform level, streamlining the software stack.
In a briefing with EE Times, John Simpson, senior principal architect at SiFive, elaborated on the advantages over traditional approaches. “The traditional industry uses hardened approaches which are fixed. You can’t change the finite state machine,” he explained, referring to a conventional accelerator diagram.
He noted that in the fast-evolving world of AI, where “AI graphs come out every week,” a fixed finite state machine cannot adapt to new types of graphs. In contrast, SiFive’s intelligent cores offer flexibility, reduce system bus traffic by allowing local processing on the accelerator chip, and facilitate tighter coupling for pre- and post-processing tasks.
SiFive has introduced two significant advancements in memory architecture that directly address performance bottlenecks: Memory Latency Tolerance and a More Efficient Memory Subsystem.
Simpson expressed particular pride in the Memory Latency Tolerance feature, an elegant design that hides load latency. He explained that the scalar unit, which processes all instructions, dispatches committed vector instructions to a Vector Command Queue (VCQ). Crucially, if a vector load is encountered, its address is sent out to the memory system (L2 or beyond) simultaneously with it being placed in the VCQ.
The early dispatch, decoupled from execution, allows memory responses to return and be reordered into a configurable Vector Load Data Queue (VLDQ). “The intent here is that the load will pick the data up from the vector load data queue where it’s sitting waiting to be picked up,” Simpson stated.
This ensures data is ready when the load instruction eventually pops out of the VCQ, leading to a “vector load to use of one cycle”. Simpson highlighted the competitive edge, noting, “The Xeon announced at Hot Chips can do 128 outstanding requests, and that is the top of the line Xeon, and we are at 1,024 in the four-core.” This “beautiful technique” ensures continuous processing by effectively preventing pipeline stalls.

The More Efficient Memory Subsystem represents another substantial upgrade, moving from an inclusive to a non-inclusive cache hierarchy. John Simpson detailed the previous generation’s inclusive cache system, where data in the shared L3 cache was replicated in the private L2 and L1 caches, resulting in only 40% effective utilization of the total 2.5 megabyte equivalent area.
The second-generation design eliminates this replication. “Now the data is not replicated anywhere,” Simpson explained, leading to 100% utilization of the 1.5 megabyte equivalent area.
This translates to “1.5x capacity over the Gen One, and the area works out to be 60%,” making it “much more efficient” and a “simple win.”

Beyond memory, SiFive has also integrated a new hardware pipelined exponential unit. While multiply-accumulates (MACs) dominate AI workloads, exponentiation becomes the next major bottleneck. For instance, in BERT large models accelerated by a matrix engine, softmax operations, which involve exponentiation, account for over 50% of the remaining cycles.
SiFive’s software optimization reduced the exponentiation function from 22 to 15 cycles, but the new hardware unit dramatically reduces it to a single instruction, lowering the total function time to five cycles. This “nonlinear speedup built-in” is crucial for maximizing acceleration in all AI models.
SiFive’s second-generation Intelligence family is also making inroads with hyperscalers, who are increasingly developing their own custom chips. While these companies continue to rely on ARM for their application cores, they are actively integrating multiple SiFive XM cores or their own hardware matrix engines with SiFive’s intelligence cores for control and assist functions.
These hyperscalers harbor ambitions to “replace Nvidia in the data center,” with customers targeting performance levels of up to four pedaflops with XM cores.
The company also acknowledges the burgeoning RISC-V ecosystem in China, noting significant design wins across the entire Intelligence family, from data centers to the edge. While specific customer names remain confidential due to non-disclosure agreements, SiFive indicates a strong demand for their IP from several Chinese customers.
“SiFive’s announcement is promising and great news for the ecosystem since it reflects how real and powerful RISC-V is for tackling the complexity of current and future challenges, mainly in the domain of AI,” says Teresa Cervero, leading research engineer at the Barcelona Supercomputing Center (BSC), and RISC-V ambassador. “Having RISC-V industrial hardware supporting RVA23, RVV 1.0, and customizable for adapting the hardware to specific needs of different markets is key to unlocking the real power for designing future high-performance infrastructures, strengthening the software layers, and becoming a full competitive alternative to existing commercial architectures.”
The robust software stack for the second-generation Intelligence family, built on SiFive’s extensive four-plus years of investment in RISC-V AI, supports scalability. For the XM series, the ML runtime already distributes workloads across multiple XM clusters on a single chip. While scaling beyond a single die requires further development on an interprocessor communication (IPC) library, it remains a clear roadmap item driven by customer demand for instantiating multiple XMs in a single chip.
SiFive’s new Intelligence family, with its innovative memory architecture, specialized interfaces, and enhanced processing capabilities, positions RISC-V as a significant player in the evolving landscape of AI hardware, offering unparalleled flexibility and performance from the smallest edge devices to the largest data centers.
From EETimes