Coming across “I transcribed hours of interviews offline using this open-source tool” in my news feeds, I can’t help but wish this approach to applied AI was more common in this era of ChatGPT.
There’s plenty of reason to run models in a cloud context, particularly if you want to have truly large or complex models. The more computationally invasive the task, the more a data center starts looking smart—ditto if handling many users. But that doesn’t mean it’s not possible to do useful things with LLMs on commodity hardware.
The catch of course, tends to be the need for a powerful computer by modern standards. PrivateLLM’s quantified models for example, range from models that probably fit on several year old iPhone (15/14 series) to a pimped out Mac Studio.
Considering that many Intel/AMD chipsets over the past decade max out in the 16-64 GB of RAM range, and that you basically need 16 GB in a modern laptop, I think people underestimate the possibilities for squeezing smaller models onto PCs for specialized tasks. Especially when given modern computer hardware. I mostly feel that the drive towards NPUs is marketing snake oil, but to be fair, it’s pretty unlikely that we’re going to start seeing beefier GPUs in the typical computer. As impressive as modern integrated graphics have been compared to when I was young, common designs still fall far short of even laptop dedicated graphics, never mind six pounds of RTX!
Here’s at least, hoping that those fancy ASICs see some useful value rather than being today’s equal of the Transistor Wars. If nothing else, I suppose it helps bring the base of installed RAM a little higher in-between price hikes and push faster CPUs and SoCs down people’s throats.
