[ExI] Why do the language model and the vision model align?

Tue Feb 10 14:16:19 UTC 2026

On Tue, Feb 10, 2026, 8:23 AM John Clark via extropy-chat <
extropy-chat at lists.extropy.org> wrote:

> On Tue, Feb 10, 2026 at 5:59 AM Ben Zaiboc via extropy-chat <
> extropy-chat at lists.extropy.org> wrote:
>
> * > If we take, for an example, Beauty (or Justice, or Homesickness, etc.,
>> etc.), Plato's philosophy regards it as a real thing, that exists
>> somewhere, somehow, independently of human minds. My philosophy, and I
>> suspect (or at least hope) that of most sensible people, holds that it is
>> not.*
>
>
> *I agree. I too think beauty, justice and homesickness are entirely
> subjective, but that's OK because subjectivity is the most important thing
> in the universe. Or at least it is in my opinion.  *
>
>>
> * > Another category contains things like the number 4. That's an abstract
>> concept, but it could be argued that it represents something that 'really'
>> exists, as in, it can be said to be an objective property *
>
>
> *If there is one and only one fundamental layer of underlying reality (and
> not an infinite number of layers) then the number 4 would be part of it. I
> think a necessary, but not sufficient, condition for being truly
> fundamental is that it does not have a unique location in space or of time;
> the number 4 doesn't, and neither does consciousness. *
>
>
>> *> of certain collections of things.*
>>
>
> *In the case of consciousness the "collection of things" are bits, and
> consciousness is the way bits feel when they are being processed
> intelligently. And that's about all one can say about consciousness. And
> that's why I'm much more interested in intelligence than consciousness.*
>
>  > *Does Colour 'really' exist, in a Platonic sense?*
>
>
> *Yes, because if there's one thing that we (or at least I) can be
> absolutely certain of its that subjectivity exists, and the experience of
> color is part of that. Thus, IF there is a fundamental reality (and not an
> infinity of layers) THEN color is part of it.  *
>
>>
> * > It's hardly surprising that thinking systems would converge on
>> efficient ways of representing what exists in the real world,*
>
>
> *What's surprising to me is that an AI that was only trained on words can
> converge on ANYTHING, let alone to something congruent to the real world.
> If we had access to an extra terrestrial's library that was huge but
> contained no pictures, just 26 different types of squiggles arranged into
> many trillions of words, I don't see how we could ever make any sense out
> of it, I don't see how we could ever write a new sentence in ET's language
> that was not only grammatically correct but also expressed an idea that was
> true and non-trivial; but somehow an AI could. Don't ask me how.*
>

I used to wonder about that, but I have found a satisfactory explanation:

Imagine aliens (even from another universe with alien laws) got access to
the raw text of Wikipedia (no images, only text). Would they have any hope
of understanding anything about it?

The answer is yes!

This is because "compressibility" is a property of pure information, even
if nothing but meaningless symbols or a string of 1s and 0s.

Every time we find a way to better compress the data, required finding some
underlying pattern in it.

An ideal compression of Wikipedia would require not only understanding all
the grammar of our languages, but also how our minds think, and what our
world looks like and how it behaves.

Our base 10 numerical representation system would quickly fall to such
compression, given all the numbered lists that so commonly appear across
the site. Our arithmetic equations then would easily be compressed after
that.

The periodic table could then be understood as elements containing
different numbers of fundamental particles.

If there's an article on the "triple alpha process" the ideal compression
algorithm would know that "carbon" is the most likely word that follows
"three helium nuclei can combine to yield ". To compress well requires
ability to predict well. This compressing this article well requires
understanding of atomic physics and interactions, etc.

If wikipedia contains data from experiments that we don't fully understand,
a perfect compression could even reveal new physical laws we haven't yet
discovered, and which haven't yet been postulated in any article on
Wikipedia.

Such is the power of compression in revealing patterns in information.

To compress well isn't understand well. To compress perfectly is to
understand perfectly.

There is a very close relationship between compression and prediction. LLMs
are trained to predict (which is a fundamental element of being able to
compress).

If you had an ideal word prediction model, you could convert it to an ideal
word compression model as follows:
To compress a sequence of words, ask the model to guess the first word, if
it's wrong ask it for it's next choice, and so on until it guess correctly.
Take note of N where N is the number of guesses it took the model to
correctly guess the first word.
Now store the number of gusses as the first number in the compressed data.
Now ask the model to guess the next word given the previous one. Count how
many guesses it took, and store that number as the second number in the
compressed data.

At the end of the compression you have only a list of numbers representing
how many guesses this prediction model took to guess correctly. Most of
those numbers will be small, and for a very good model, most will be "1".

You can as a final pass further compress such a list using Run length
encoding, Elias encoding, Huffman encoding, and so on, but the core of the
compression comes from a model that understands the patterns in language,
and when the language refers to objects in the real world, an ideal
compression model must under the patterns of that world.

Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20260210/a56405e4/attachment.htm>