[ExI] Why do the language model and the vision model align?
Jason Resch
jasonresch at gmail.com
Sun Feb 15 14:34:36 UTC 2026
On Sun, Feb 15, 2026, 7:27 AM John Clark <johnkclark at gmail.com> wrote:
> On Sat, Feb 14, 2026 at 9:29 AM Jason Resch via extropy-chat <
> extropy-chat at lists.extropy.org> wrote:
>
> *>>> Both reflect the physical world. Directness or indirectness I don't
>>>> see as relevant. Throughout your brain there are many levels of
>>>> transformation of inputs.*
>>>>
>>>
>>> *>> But most of those transformations make sense only in light of other
>>> things that the brain knows, chief among them being an intuitive
>>> understanding of everyday physics. Nevertheless as it turns out, that fact
>>> doesn't matter. Perhaps I shouldn't have been but that surprised me. *
>>>
>>
>> *> I don't think much knowledge of physics is pre-wired.*
>>
>
> *I agree, most physical intuition is the result of direct contact with the
> outside world with no intermediary between. Human teachers were able to
> help me learn to read English because they had brains similar to mine and
> they, like me, had direct contact with the outside world; for example: they
> showed me this sequence of squiggles "tree" and then they pointed to a tall
> thing with green stuff on top, and I got the idea.*
>
Think about what this means from the brains internal perspective. It seems
certain patterns of firing with these nerves over here, being correlated
with certain patterns of nerves firing over there. All the brain needs to
form meaning is correlation. The correlation enables compression, "the
sound of the word tree,and the image of the word tree are two
representations of the same concept."
* But how did an AI that has never known anything except squiggles manage
> to make that same connection? I don't know but somehow it did. *
>
Now think of all the correlations that exist among the corpus the AI has.
The word "tree" co-occurs (correlates with) with other words like: foliage,
plant, life, tall, branches, leaves, shade, roots, sapling, seeds, wood,
bark, etc.
And each of these words has it's own correlations. The correlation enable
compression, as does any pattern of usage that surrounds these words. When
the LLM trains on the sentence "When the tree falls" it learns that falling
is a possible behavior trees are capable of. Then it can look to all the
correlations the word "fall" has, and so on. This correlation map, I
believe, is the same sort of structure revealed in the platonic
representation hypothesis work. Consider that the features extracted in an
object recognition visual network, would be quite similar to the word
correlations, a "tree" for the vision model, is something that correlates
with having the features: trunk, branches, bark, leaves, etc. it's the same
male, because both our language and our pictures correlate to the same
underlying physical world.
> *> The informational complexity of an adult human brain is approximately a
>> million times that of the informational complexity of the genome.*
>>
>
> *Yes, and that's why I always thought the argument that true AI would
> never be possible because it would need to be so ridiculously complex we
> could never understand it, was bogus. The amount of information required to
> make a seed AI is actually quite small. *
>
True. In fact, AIXI shows that perfect universal intelligence requires only
two lines of code.
> *> what's important to take away from the Chess example is that an
>> understanding of how things interact can be extracted *merely from textual
>> examples and descriptions* of those things interacting.*
>>
>
> *Even i**f the fundamental laws of physics were radically different it
> would not change chess anymore than it would change the fact that there are
> an infinite number of prime numbers, but the vast majority of things that
> we believe are the most important would change. *
>
This seems like an incomplete thought, what is the implication or point of
this? Note that chess is no more unique among possible games, than our
physical laws are among possible physical universes.
Jason
>
>
>
>>
>>
>>> *Humans have found lots of text written in "Linear A" that was used by
>>> the inhabitants of Crete about 4000 years ago, and the even older writing
>>> system used by the Indus Valley Civilization, but modern scholars have been
>>> unable to decipher either of them even though, unlike the AI, they were
>>> written by members of their own species. And the last person who could read
>>> ancient Etruscan was the Roman emperor Claudius. The trouble is those
>>> civilizations are a complete blank, we have nothing to go on, today we
>>> don't even know what spoken language family those civilizations used. *
>>>
>>> *Egyptian hieroglyphics would have also remained undeciphered except
>>> that we got a lucky break, we found the Rosetta Stone which contained the
>>> same speech written in both hieroglyphics and an early form of Greek which
>>> scholars could already read. Somehow AI has found their own "Rosetta
>>> Stone", I just wish I knew what it was. *
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20260215/174153df/attachment.htm>
More information about the extropy-chat
mailing list