[ExI] Why do the language model and the vision model align?

Jason Resch jasonresch at gmail.com
Mon Feb 16 16:33:57 UTC 2026


On Mon, Feb 16, 2026, 8:07 AM John Clark <johnkclark at gmail.com> wrote:

> On Sun, Feb 15, 2026 at 9:36 AM Jason Resch via extropy-chat <
> extropy-chat at lists.extropy.org> wrote:
>
> *>the sound of the word tree,and the image of the word tree are two
>> representations of the same concept."*
>>
>
> *Yes but could you know that if you've never seen an image of a tree and
> in fact had never been exposed to anything except text? Apparently the
> answer is "yes", but that is not the answer I would have guessed.  *
>

As I explained in the part you deleted, your brain doesn't get images from
the outside world, it gets timing values of spikes of neurons. So your
understanding of the correlation between the sound of the word tree and the
image of a tree, and all your understanding of the outside world, is built
from nothing but patterns of correlation between otherwise meaningless
symbols (the neural spikes).



> *> think of all the correlations that exist among the corpus the AI has.
>> The word "tree" co-occurs (correlates with) with other words like: foliage,
>> plant, life, tall, branches, leaves, shade, roots, sapling, seeds, wood,
>> bark, etc.*
>>
>
> *I can see how an AI could figure out that the squiggle "tree" is often
> associated with the squiggle "foliage" and various other squiggles, but how
> it manages to make an association between any of those squiggles and
> something that exists in the external physical world is a mystery, at least
> to me. And a dictionary would be of no help, that's just a list of more
> squiggles.  *
>

A dictionary does help. Remember my "a X is a Y" and "X is made of Y"
example and what those two linguistic patterns + a dictionary or large
corpus, can reveal about the structure and organization of the world. And
that's just two linguistic patterns, of thousands. Consider all the
prepositions and their meaning in terms of spatial relations: in, on,
above, below, between, across, around, etc. You just need to think more
about all the information that is there. It's not a circular loop as you
claim it to be.


> *> And each of these words has it's own correlations. *
>>
>
> *Yes, the word "consciousness" is defined by the word "awareness", and
> "awareness" is defined by the word "consciousness", and round and round we
> go. *
>

Have you checked the definition of these words? These are the definitions I
found:

consciousness:
“awareness of one’s own existence, sensations,
thoughts, surroundings, etc.”

awareness:
“having knowledge“

knowledge:
“acquaintance with facts, truths, or principles”

acquaintance:
“personal knowledge as a result of study, experience, etc.”

So putting these together does not result in meaningless circularity, but
rather, it expands to mean:

Consciousness =
having personal knowledge of the facts, truths, or principles of one's own
existence, sensations, thoughts, surroundings, etc.


>
>> *> This correlation map, I believe, is the same sort of structure
>> revealed in the platonic representation hypothesis work. Consider that the
>> features extracted in an object recognition visual network, would be quite
>> similar to the word correlations, a "tree" for the vision model, is
>> something that correlates with having the features: trunk, branches, bark,
>> leaves, etc. it's the same male, because both our language and our pictures
>> correlate to the same underlying physical world.*
>>
>
> *Very recent evidence indicates that something like that must be true, and
> because of that superintelligence will arrive even sooner than I thought.  *
>

Now consider that images are nothing but pixels, and pixels are nothing but
bytes, and bytes are just "meaningless symbols". Yet, when we give a model
billions of images it comes to understand the world -- from nothing other
than the patterns of correlation among these "meaningless symbols." Does
this analogy help show that with language, it's not the symbols that convey
meaning, but the patterns between those symbols?


>
>
> *>> I always thought the argument that true AI would never be possible
>>> because it would need to be so ridiculously complex we could never
>>> understand it, was bogus. The amount of information required to make a seed
>>> AI is actually quite small.  *
>>>
>>
>> *> True. In fact, AIXI shows that perfect universal intelligence requires
>> only two lines of code.*
>>
>
> *That I think would be going a little too far.*
>

You can look.up the wiki article on AIXI if you don't believe me.

* Two lines of code may be enough to describe an abstract Turing Machine,
> but an abstract Turing Machine can't calculate anything, you need a real
> physical machine for that. Human beings have found a way to literally turn
> sand into real Turing Machines, and that manufacturing ability is what a
> seed AI would need to have, or at least have the capability to evolve into
> something that was able to master that very complex technology. The entire
> genome of a human being only contains about 750 MB of information, I would
> guess that just one or 2 MB of that would be sufficient to make a seed AI;
> more than two lines of code but still not very much.  *
>

For AIXI to make up for it's lack of information requires that it use vast
and unrealistic amounts of computation. It's not practical, but it is
useful as defining what perfect universal intelligence is. It provides a
target to aim for, as well as a common ground of understanding regarding
what intelligence is.



> *>> Even i**f the fundamental laws of physics were radically different it
>>> would not change chess anymore than it would change the fact that there are
>>> an infinite number of prime numbers, but the vast majority of things that
>>> we believe are the most important would change.   *
>>>
>>
>> *> This seems like an incomplete thought, what is the implication or
>> point of this?*
>>
>
> *The point is it's easy to see how an AI that has been exposed to nothing
> but text could learn pure abstract mathematics, but it's much more
> difficult to figure out how it could also learn physics.  *
>

One is a subset of the other. If a LLM can learn math, it can learn
anything.

Jason



>>
>>
>>>
>>>
>>>
>>>>
>>>>
>>>>> *Humans have found lots of text written in "Linear A" that was used by
>>>>> the inhabitants of Crete about 4000 years ago, and the even older writing
>>>>> system used by the Indus Valley Civilization, but modern scholars have been
>>>>> unable to decipher either of them even though, unlike the AI, they were
>>>>> written by members of their own species. And the last person who could read
>>>>> ancient Etruscan was the Roman emperor Claudius. The trouble is those
>>>>> civilizations are a complete blank, we have nothing to go on, today we
>>>>> don't even know what spoken language family those civilizations used. *
>>>>>
>>>>> *Egyptian hieroglyphics would have also remained undeciphered except
>>>>> that we got a lucky break, we found the Rosetta Stone which contained the
>>>>> same speech written in both hieroglyphics and an early form of Greek which
>>>>> scholars could already read. Somehow AI has found their own "Rosetta
>>>>> Stone", I just wish I knew what it was. *
>>>>>
>>>>
>>>> __
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20260216/647a4615/attachment-0001.htm>


More information about the extropy-chat mailing list