[ExI] Why do the language model and the vision model align?

Sat Feb 14 13:01:29 UTC 2026

On Fri, Feb 13, 2026 at 10:09 AM Jason Resch via extropy-chat <
extropy-chat at lists.extropy.org> wrote:

*>> I think it is more surprising. For humans most of those dots and dashes
>> came directly from the outside physical world, but for an AI that was
>> trained on nothing but text none of them did, all those dots and dashes
>> came from another brain and not directly from the physical world.  *
>>
>
> *> Both reflect the physical world. Directness or indirectness I don't see
> as relevant. Throughout your brain there are many levels of transformation
> of inputs.*
>

*But most of those transformations make sense only in light of other things
that the brain knows, chief among them being an intuitive understanding of
everyday physics. Nevertheless as it turns out, that fact doesn't matter.
Perhaps I shouldn't have been but that surprised me.  *

*> Consider: I could train a neural network to monitor the health of a
> retail store by training it on every transaction between every customer and
> vendor, or I could train a neural network on the quarterly reports issues
> by the retail store's accountant to perform the same analysis. That one set
> of data has gone though and account's brain doesn't make the data
> meaningless or inscrutable.*
>

*Not a good example, it's no better than Chess because neither would
require even an elementary understanding of how physical objects interact
with each other, but most things of real importance do.  *

> *>> I can see how an AI that was trained on nothing but text could
>> understand that Pi is the sum of a particular infinite sequence, but I
>> don't see how it could understand the use of Pi in geometry because it's
>> not at all clear to me how it could even have an understanding of the
>> concept of "space"; and even if it could, the formulas that we learned in
>> grade school about how Pi can be used to calculate the circumference and
>> area of a circle from just its radius would be incorrect except for the
>> special case where space is flat.  *
>>
>
> *> Even if the LLM lacks a direct sensory experience of 3-dimensional
> world, it can still develop an intellectual and even an intuitive
> understanding of it, in the same way that a human geometer with no direct
> experience of 5 dimensional spaces, can still tell you all about the
> various properties of shapes in such a space, and reason about their
> relationships and interactions.*
>

*But the human scientist is not starting from nothing, he already has an
intuitive understanding of how 3 dimensions work so he can make an
extrapolation to 5, but an AI that was trained on nothing but text wouldn't
have an intuitive understanding about how ANY spatial dimension works. *

*Humans have found lots of text written in "Linear A" that was used by the
inhabitants of Crete about 4000 years ago, and the even older writing
system used by the Indus Valley Civilization, but modern scholars have been
unable to decipher either of them even though, unlike the AI, they were
written by members of their own species. And the last person who could read
ancient Etruscan was the Roman emperor Claudius. The trouble is those
civilizations are a complete blank, we have nothing to go on, today we
don't even know what spoken language family those civilizations used. *

*Egyptian hieroglyphics would have also remained undeciphered except that
we got a lucky break, we found the Rosetta Stone which contained the same
speech written in both hieroglyphics and an early form of Greek which
scholars could already read. Somehow AI has found their own "Rosetta
Stone", I just wish I knew what it was. *

*John K Clark    *
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20260214/38d5bbd9/attachment.htm>