LLMs in 2.5 Watts: Hailo Targets Lower Power Market
AI chip startup Hailo is pitching the first available member of its second-generation Hailo-10 family at a lower power-performance point than detailed a year ago. Now available in volume, the Hailo-10H can run 2B-parameter LLMs in around 2.5 W, based on measured performance (versus the originally billed 7B, 5W power-performance point for the 10 series, which was based on simulation of the same silicon). The 10 series will eventually include members at various power-performance points, but a significant gap in the market made the 2.5-W power envelope an obvious first choice, Hailo CEO Orr Danon told EE Times.
Feedback from customers and potential customers was that there is a gap for an LLM-capable chip in the 2-2.5-W space, Danon said.
“This isn’t achievable with any other device,” he said. “At the edge, the majority of people are looking to run workloads between 1 and 3 billion parameters. This is the popular configuration from a performance perspective, from a memory capacity perspective, and also from a cost perspective.”
Orr Danon (Source: Hailo)
Hailo has internally demonstrated several 2B language and multi-modal models running on the 10H with a time-to-first-token below 1 second and a throughput above 10 tokens per second. The second-generation architecture adds the ability to run LLMs and generative AI alongside what Danon calls “classic AI” – existing edge workloads like computer vision and audio processing. For example, it can run YOLOv11m in real-time on a 4K video stream.
“Classic AI on the same device, using the same software stack is [a request] that’s been coming from customers all over [the industry],” Danon said. “We have had thousands of inquiries in the last year with people having all kinds of ideas what to do with generative AI at the edge, how to combine multiple modalities, LLM, VLMs, CNNs, and all of this together with transformer architectures in the same platform.”
In general, device makers have no issues reaching the proof-of-concept stage with cloud-based AI, Danon said, but considering the actual compute costs, as well as the practicalities of connectivity, privacy, and availability, many lead to considering edge solutions.
“Then, especially when you’re looking at embedded, fanless devices with size constraints, they get to these practical considerations of, okay, what can I do with two or three Watts? What can fit into my M.2? And that’s where we come into play,” he said.
Software Community
Crucially, the Hailo-10H works with Hailo’s existing software stack.
“We’ve put in a lot of effort to launch this together with a proper SDK, keeping our standards very high in terms of what we are providing to keep the reputation that we have with the Hailo-8,” Danon said. “So we took a lot of care to make sure that we are providing something that is viable to the market. It’s not just a PoC level solution, but something that you can actually productize.”
Hailo’s developer community has reached 10,000 active users per month, with the majority (around 80%) of these users joining the Hailo community through the company’s compatibility with the hobbyist platform Raspberry Pi.
The success of this move in building a developer community in such numbers was a little unexpected, Danon said.
“Because I run an edge chip company, I was [surprised] at how many people were interested in this,” he said. “When we partnered with Raspberry Pi, they told us they thought many people would be interested, and they were right.”
A sizeable community of developers is, of course, a great proof point for a startup’s software stack.
“My concern when launching [with Raspberry Pi] was, are we ready for this?” Danon said. “Because it’s different to brag about our software stack than to actually put it to the test. But we felt it’s where it should be. And of course, we also got improvement requests, some nice, some less nice, but overall, the feedback is very, very good.”
Hailo’s job now is to nurture the community by being responsive and releasing content people can use, amplify, and modify, Danon said. The company has a dedicated team to support this effort.
“We do find that many professional users also come through the Raspberry Pi,” Danon said. “Everybody likes the idea of just buying one, getting it in a few days, and then you can start playing around with it, and then look at something more deployment-oriented, be it through the Raspberry Pi ecosystem or in some other way.”
While it’s difficult to track exactly where design wins come from, it’s been gratifying to visit customers and see Hailo-equipped Raspberry Pis on their lab benches, Danon said.
Automotive Qualified
The Hailo-10H is automotive-qualified to AEC-Q100 Grade 2 standards (the vision-focused first-generation Hailo-8 also has its automotive qualifications, Danon said). The 10-H is specifically positioned for automotive cockpit systems.
“Typically in the automotive landscape, you can find devices with very high AI capacity, the thing is that they cost a lot and consume a lot of power,” Danon said. “In many cases, it is more desirable to accelerate an existing design… if you wanted to add, let’s say a sentry mode that requires real-time video processing, which is quite a heavy task, you might not want to upgrade the whole system but might rather add an accelerator with much lower cost and lower power consumption that can handle that job.”
The majority of Hailo’s automotive business is currently in the Far East, including but by no means limited to China, with the European market “finally starting to look more interesting,” Danon said.
Point of Sale
The Hailo-10H’s first public customer is HP, which is supplying the 10H on an M.2 card as an add-on for its point-of-sale systems.
“The most exciting applications are things that are multimodal, be it in situational awareness, human machine interfaces, or security applications,” Danon said. “These are things that are exciting and currently unattainable at the edge. So this is something that we are very happy to support.”
Many companies desire to deploy LLMs at the edge, but are still at the beginning of that journey. The market will take years to stabilize while edge use cases for LLMs are properly understood.
“People are finding use cases which are practical, which have positive ROI, which are deployable,” Danon said. “Doing a PoC is easy. Validating that this is actually something that can go into production at the hardware, software, and most importantly, application level is very hard. With generative AI, people are carefully finding their way into real impact.”
As for the other members of the Hailo-10 family – “we certainly see additional demands for different working points,” Danon said.