[ExI] Why do the language model and the vision model align?

Fri Feb 13 15:07:45 UTC 2026

On Fri, Feb 13, 2026, 6:59 AM John Clark <johnkclark at gmail.com> wrote:

> On Thu, Feb 12, 2026 at 8:54 AM Jason Resch via extropy-chat <
> extropy-chat at lists.extropy.org> wrote:
>
> *>> before the average student has even open a chemistry book he has
>>> already been exposed to nearly two decades of real world experience and has
>>> an intuitive feel for things like mass, velocity, heat and position, so he
>>> can understand what the book is saying, *
>>>
>>
>> *> But for the human brain to read that point , it has to construct it's
>> full understanding of the real world using nothing more than the *raw
>> statistics* as it finds in the patterns and correlations of nerves firing
>> at different times. This is all the brain ever sees of the outside world.
>> The student didn't receive images and sounds and smells, the brain had to
>> invent those out of raw "dots and dashes" from the nerves. If the brain can
>> do such a thing, then is it any more surprising that another neural network
>> could bootstrap itself to a degree of understanding from nothing more than
>> data that contains the statistical correlations?*
>>
>
> *Yes I think it is more surprising. For humans most of those dots and
> dashes came directly from the outside physical world, but for an AI that
> was trained on nothing but text none of them did, all those dots and dashes
> came from another brain and not directly from the physical world.  *
>

Both reflect the physical world. Directness or indirectness I don't see as
relevant. Throughout your brain there are many levels of transformation of
inputs. At each stage, the higher levels receive increasingly less direct,
and less raw inputs. Yet each stage of processing finds a way to make sense
out of the previous stage's outputs. So whether inputs are raw, or
pre-processed in some way, must make no difference.

Consider: I could train a neural network to monitor the health of a retail
store by training it on every transaction between every customer and
vendor, or I could train a neural network on the quarterly reports issues
by the retail store's accountant to perform the same analysis. That one set
of data has gone though and account's brain doesn't make the data
meaningless or inscrutable. If anything, the prior sense-making by the
accountant should make the second network much easier to train than giving
it raw transaction data. Likewise, giving a LLM distilled human thought
should be a shortcut compared to giving a network raw sensory data as
received by human retinas and ear drums.

>
> *> Think about the problem from the perspective of a human brain, alone
>> inside in a dark, quiet, hollow bone. With only a fiberoptic cable
>> connecting it to the outside world. This cable sends only bits. The brain
>> must figure out how to make sense of it all, to understand the "real world"
>> from this pattern of information alone.*
>>
>
> *Yes we are a brain in a vat made of bone, but we have something that an
> AI trained on nothing but text does not have, a fiber optic cable connected
> directly to the outside world and not indirectly through an intermediary;
> however that complication apparently makes no difference because the AI can
> still figure things out about the physical world, and that is what I find
> surprising.  *
>

If the signal is totally random or otherwise incompressible, then the task
is hopeless.  A network trained to predict on random data cannot succeed.
To be scrutable, all that's needed are regularities, patterns, an entropy
per bit < 1.

The level of directness or indirectness is of no importance for learning
the patterns of the outside world, so long as the patterns in the outside
world are reflected in the text. If, on the other hand, we gave the LLM
English text that was all about life and events in some alien universe with
different laws, particles, chemistry, etc., then the LLM would learn and
understand that different universe, not ours.

> *>>> The periodic table, and Wikipedia's article on each element, lists
>>>> atomic number (number of protons) in addition to atomic weights.*
>>>>
>>>
>>> *>> But how could the poor AI make sense out of that Wikipedia article
>>> if it had no understanding of what the sequence of squiggles
>>> "w-e-i-g-h-t-s" even means?  I don't deny that it can understand what it
>>> means, I just don't know how.  *
>>>
>>
>> *> I gave you an example, regarding Pi, and surrounding words.*
>>
>
> *I can see how an AI that was trained on nothing but text could understand
> that Pi is the sum of a particular infinite sequence, but I don't see how
> it could understand the use of Pi in geometry because it's not at all clear
> to me how it could even have an understanding of the concept of "space";
> and even if it could, the formulas that we learned in grade school about
> how Pi can be used to calculate the circumference and area of a circle from
> just its radius would be incorrect except for the special case where space
> is flat.  *
>

Even if the LLM lacks a direct sensory experience of 3-dimensional world,
it can still develop an intellectual and even an intuitive understanding of
it, in the same way that a human geometer with no direct experience of 5
dimensional spaces, can still tell you all about the various properties of
shapes in such a space, and reason about their relationships and
interactions.

Consider how much theoretical physicists understood about black holes and
atomic nuclei, at times long before anyone had ever seen one. Intellectual
understanding can be honed even in the absence of sensorial experience.

>
>> *>> a neural network can model any continuous function with arbitrary
>>> precision, but the vast majority of continuous functions do not model
>>> anything fundamental in either Newtonian or Quantum Physics, so** how
>>> does an AI differentiate between those that do and those that don't?  *
>>>
>>
>> *> Are you asking how neural networks learn functions from samples of
>> input and output? *
>>
>
> *No.*
>
> > *If you asking how an neural network can approximate any computable
>> logic circuit? *[...] *If you are asking something else not covered
>> here, you will need to be more specific.*
>>
>
> *I'm asking how a neural network that was trained on nothing but a
> sequence of squiggles (a.k.a. text) can differentiate between a computable
> function that models a fundamental physical law and a computable function
> that does NOT model a fundamental physical law. It is now beyond doubt that
> a neural network can do exactly that, I'm just saying I'm surprised and a
> little confused by that fact, and I think even some of the top people at
> the AI companies are also a little confused.   *
>

Among the myriad TB or PB of training materials supplied to these LLMs, are
physics texts books and problem sets. Completing the sentences on the pages
that list physics problems requires understanding the relevant formulae and
when to apply the right one in the right context.

> *> how can we explain that LLMs can play chess? To play chess well
>> requires a model/function that understands chess, how the pieces move,
>> relate, attack, and what the goal of the game is. This is far beyond a mere
>> "stochastic parrot" as some have attempted to describe LLMs as being.*
>>
>
> *I certainly agree with that!  But to play a game of chess, even at the
> grandmaster level, it would not be necessary for the AI to understand the
> concept of "space" or to have even a rudimentary understanding of any of
> the laws of physics.  *
>

No but you can see how one is just a toy example of the other.

If one can understand how objects in the "universe of chess" operate,
merely from reading squiggles about chess, and squiggles of recorded games
of chess, then understanding the physical universe is not fundamentally
different, it's the exact same problem, only at a larger scale (since the
physical universe is more complex than the chess universe).

Jason

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20260213/6369b85a/attachment.htm>