<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">I am glad to say that it seems that I was wrong when I argued that it would be a while until we could run good large language models on our puny hardware. A Bulgarian programmer managed to hack Meta’s torrented language model LLaMA to reduce the memory footprint to 4 bits per parameter and now the 13 billion parameter version can be run on consumer hardware. See <a href="https://simonwillison.net/2023/Mar/11/llama/" class="">Large language models are having their Stable Diffusion moment (simonwillison.net)</a> for the full story.</body></html>