<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 19, 2023, 5:51 PM Darin Sunley via extropy-chat <<a href="mailto:extropy-chat@lists.extropy.org">extropy-chat@lists.extropy.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">There's a running thread in LLM discourse about how LLMs don't have a world model and therefore are a non-starter on the path to AGI.<div><br></div><div>And indeed, on a surface level, this is true. LLMs are a function - they map token vectors to other token vectors. No cognition involved.</div><div><br></div><div>And yet, I wonder if this perspective is missing something important. The token vectors LLMs are working with aren't just structureless streams of tokens. they're human language - generated by human-level general processing.</div></div></blockquote></div></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>That seems like it's the secret sauce - the ginormous hint that got ignored by data/statisticis-centric ML researchers. When you learn how to map token streams with significant internal structure, the function your neural net is being trained to approximate will inevitably come to implement at least some of the processing that generated your token streams. </div><div><br></div><div>It won't do it perfectly, and it'll be broken in weird ways. But not completely nonfunctional. Actually pretty darned useful, A direct analogy that comes to mind would be training a deep NN on mapping assembler program listings to output. What you will end up with is a learned model that, to paraphrase Greenspun's Tenth Rule, "contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of" a Turing-complete computer.</div><div><br></div><div>The token streams GPT is trained on represent an infinitesimal fraction of a tiny corner of the space of all token streams, but they're token streams generated by human-level general intelligences. This seems to me to suggest that an LLM could very well be implementing significant pieces of General Intelligence, and that this is why they're so surprisingly capable.</div><div><br></div><div>Thoughts?</div><div></div></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto">I believe the only way to explain their demonstrated capacities is to assume that they create models. Here is a good explanation by the chief researcher at OpenAI:</div><div dir="auto"><br></div><div dir="auto"><a href="https://twitter.com/bio_bootloader/status/1640512444958396416?t=MlTHZ1r7aYYpK0OhS16bzg&s=19">https://twitter.com/bio_bootloader/status/1640512444958396416?t=MlTHZ1r7aYYpK0OhS16bzg&s=19</a></div><div dir="auto"><br></div><div dir="auto">As designed, a single invocation of GPT cannot be Turing complete as it has only a finite memory capacity and no innate ability for recursion. But a simple wrapper around it, something like the AutoGPTs, could in theory transformer GPT into a Turing complete system. Some have analogized single invocations of GPT as like single instructions by a computer's CPU.</div><div dir="auto"><br></div><div dir="auto">(Or you might think of it like a single short finite time of processing by a human brain)</div><div dir="auto"><br></div><div dir="auto">Jason </div></div>