[ExI] Why do the language model and the vision model align?

Wed Feb 11 15:27:36 UTC 2026

On Wed, Feb 11, 2026, 8:27 AM John Clark <johnkclark at gmail.com> wrote:

> On Tue, Feb 10, 2026 at 9:21 AM Jason Resch via extropy-chat <
> extropy-chat at lists.extropy.org> wrote:
>
> *>> What's surprising to me is that an AI that was only trained on words
>>> can converge on ANYTHING, let alone to something congruent to the real
>>> world. If we had access to an extra terrestrial's library that was huge but
>>> contained no pictures, just 26 different types of squiggles arranged into
>>> many trillions of words, I don't see how we could ever make any sense out
>>> of it, I don't see how we could ever write a new sentence in ET's language
>>> that was not only grammatically correct but also expressed an idea that was
>>> true and non-trivial; but somehow an AI could. Don't ask me how.*
>>>
>>
>> *> An ideal compression of Wikipedia would require not only understanding
>> all the grammar of our languages, but also how our minds think, and what
>> our world looks like and how it behaves. Our base 10 numerical
>> representation system would quickly fall to such compression, given all the
>> numbered lists that so commonly appear across the site.*
>>
>
> *In general it's impossible to prove that you're using the ideal
> compression algorithm on your data, for all you know there might be a way
> to compress it even more.*
>

Not impossible, just computationally infeasible.
For example, we can brute force determine whether we've found the shortest
compression of a 40-bit string by iterating over all 2^40 strings of
shorter length, then choosing the shortest.

But the intractability of finding the ideal compression, vs. a good bit not
perfect compression, doesn't undermine my point that the information is
there to be extracted, all the patterns and meanings, learned, from the
mere patterns in data, absent any references.

After all, consider that the brain only gets neural spikes, from different
nerves at different times. These are just symbols, bits. And yet, from
these mere patterns of firings, the brain is able to construct your entire
world.

This test is no less magical than what LLMs can do from "just text". The
text, like the sensory input, embodies patterns that inherit from a world
of regularities. These regularities can be learned and used to predict. And
having models that can be applied for prediction is all "understanding" is.

* OK, I can see how you might be able to determine that a book written by
> ET was about arithmetic, and that some of the symbols represented integers
> and not letters, and that a base 10 system was used. But I don't see how
> you could use the same procedure to determine that another book was about
> chemistry unless you already had a deep understanding of how real world
> chemistry worked, and I don't see how you could obtain such chemical
> knowledge without experimentation and by just looking at sequences of
> squiggles. But apparently there is a way. *
>

A student can learn a great deal from reading a chemistry text books,
without ever entering a lab and taking out a beaker.

> *> The periodic table could then be understood as elements containing
>> different numbers of fundamental particles.*
>
>
> *The existence of isotopes would greatly complicate things, for example we
> know that the element Tin has 10 stable isotopes and 32 radioactive ones,
> they all have identical chemical properties but, because of neutrons, they
> all have different masses. *
>

The periodic table, and Wikipedia's article on each element, lists atomic
number (number of protons) in addition to atomic weights.

*I don't see how you could ever deduce that fact without experimentation if
> you started with nothing but a knowledge of arithmetic.*
>

It is deduced from all the other information contained in the articles.

Learning the meaning of even a few words allows understanding the meaning
of many more, and so on. With those initial meanings, a dictionary allows
the entire language to be bootstrapped piece by piece.

* We obtained our knowledge of chemistry through experimentation but how
> did an AI, which has never observed anything except sequences of squiggles,
> obtain such knowledge?   *
>

>From reading every book written on the subject.

You'll ask: how did it come to understand the meaning of the words? This is
what I am trying to explain in this email and the one before.

There is no meaning other than having a mental model for the behavior and
properties of a thing. Such models are necessary for prediction. We train
these models to be good predictors, and that requires that the network
constructs models for all the things it encounters (words, as well as
objects described by those words, math, chemistry z physics, psychology,
etc.)

>
> *> If there's an article on the "triple alpha process" the ideal
>> compression algorithm would know that "carbon" is the most likely word that
>> follows "three helium nuclei can combine to yield ".*
>
>
> *To be able to predict that the three alpha process produces carbon, or
> even be able to predict that something called an "alpha particle" exists,
> you'd need to already have a deep understanding of a thing as unintuitive
> as Quantum Mechanics, and that would be even more difficult to obtain from
> first principles than knowledge of chemistry. *
>

I am just talking about sentence completion here. This was a test I ran on
GPT-3 when it was just a raw decoder, it successfully completed the
sentence with "a carbon nucleus". But in order for it to know that, it has
to have a world model that included the possible behavior of three helium
nuclei.

* Perhaps Mr. Jupiter Brain could deduce the existence of the physical
> world starting from nothing but arithmetic (but I doubt it) however it is
> certainly far far beyond the capability of any existing AI, so they must be
> using some other method. I just wish I knew what it was. *
>

Are you familiar with the universal approximation theorem? Neural networks
can learn any pattern (given a large enough network and enough training).
So just as a Turing machine can replicate any behavior given enough time
and memory, neural networks can learn any behavior given enough neurons and
training. Both of these classes of universality are incredible, but true.

Given that the patterns can be learned (by the UAT) how they learn the
meanings of words and of objects of the world becomes clear, any network
that has learned to predict the patterns in language well will by necessity
come to understand the patterns of the world that the language describes.

"What does it mean to predict the next token well enough? [...] It's a
deeper question than it seems. Predicting the next token well means that
you understand the underlying reality that led to the creation of that
token."
-- Ilya_Sutskever (of OpenAI)
https://www.youtube.com/watch?v=YEUclZdj_Sc

> *> To compress perfectly is to understand perfectly.*
>
>
> *But perfection is not possible. In general, finding the "perfect"
> compression, the absolute shortest representation of a piece of data, is an
> uncomputable problem.*
>

Infeasible, not uncomputable. But this is irrelevant to my point: that
data, even absent any external context, can contain meaning.

Example:
0010010000111111011010101000100010000101101000110000100011010011000100110001100110001010001011100000...

It may not look meaningful to you in this form, but more would recognize:
141592653589793238462643383279...

It's the same number. The digits of Pi are self descriptive. Now consider
this sequence appears common around terms like "circle" "radius"
"diameter", and consider the web of words that surround each of those
words. The meaning is all there, and can be extracted so long as the data
is compressible (which is only another way of saying there exist patterns
within the data ripe to be learned). Compression algorithms (even imperfect
ones) depend on this.

Jason

* By chance you might be using the best possible compression algorithm on
> your data, but there's no way to prove to yourself or to anybody else that
> you are.*
>
> *  John K Clark*
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20260211/e41c7472/attachment.htm>