THE LATEST NEWS
Nvidia Boosts LLM Inference with Open-Source Library

SANTA CLARA, CALIF. — Nvidia has doubled large language model (LLM) inference performance on its H100, A100 and L4 GPUs with a new open-source software library called TensorRT-LLM.

As evidenced by benchmark results that improve round after round for the same hardware, software is often as important as the hardware when it comes to squeezing the best possible performance out of specialized AI chips.

“A huge part of what we do is a combination of hardware and software, and today Nvidia has more software engineers than hardware engineers,” Ian Buck, VP and general manager of Nvidia’s hyperscale and HPC computing business, told EE Times. “This is part of a decision going back to the original CUDA and the motivation around delivering not just a chip with an instruction set, but a complete stack to meet developers where they are.

“This offers an opportunity to innovate at all the levels: change the hardware architecture, change the instruction set, change the compilers, change the drivers, change the tools, the libraries, everything, so we can move the whole platform forward,” he said. “That’s played itself out multiple times in the last 20 years of doing accelerated computing, and it’s true for AI inference too.”

 ----Form EE Times

Back
How Did Nvidia Improve Hopper Inference Performance 30x?
SAN JOSE, Calif.– Nvidia has dramatically boosted the inference performance of its GPUs with new data center inference orchestr...
More info
Lightmatter Unveils 3D Co-Packaged Optics for 256 Tbps in One Package
MOUNTAIN VIEW, Calif. — Lightmatter unveiled two new optical interconnect technologies that mean large multi-die chiplet desi...
More info
SoCs Get a Helping Hand from AI Platform FlexGen
FlexGen, a network-on-chip (NoC) interconnect IP, is aiming to accelerate SoC creation by leveraging AI.Developed by Arteris In...
More info