[ExI] Why do the language model and the vision model align?

Mon Feb 9 21:54:01 UTC 2026

On Mon, Feb 9, 2026 at 2:21 PM Stefano Ticozzi via extropy-chat <
extropy-chat at lists.extropy.org> wrote:

> The article you linked here appeared to refer to a convergence toward a
> Platonic concept of the Idea; it therefore seemed relevant to recall that
> Platonic Ideas have been extensively demonstrated to be “false” by science.
>

Hi Stefano,

I think you may be reading too much into the name "Platonic representation
hypothesis". The fact that the word "Platonic" is used is the name of this
hypothesis is not meant as an endorsement or claim to the the truth of
Platonism. Rather, it is used because of it's references to there being
"ideals" which our words and ideas only imperfectly approximate.

So when a vision model, and when a text module produce similar structures
in encoded in the neural networks of AIs, it is because both systems are
approximating the "ideal" of the object in question, whether it is a true,
a cat, a house, or a car.

You can accept or reject their hypothesis for why we see such a
convergence, without having to accept Plato's theory of forms existing
independently of the universe. For clarity of thinking on this, I would
keep that as a separate hypothesis not to be confused with these research
findings, or the hypothesis these researchers put forward.

Jason

>
> The convergence of textual and visual models could be explained in various
> ways without resorting to philosophical interpretations. I will suggest one
> such explanation, inspired by a book by an Italian author (Giuseppe Festa)
> that my son has just read at school, 100 passi per volare: human language
> has grown and developed around images, driven almost exclusively by the
> need to emulate the sense of sight. We have devoted very little effort to
> describing tastes, smells, tactile sensations, or sounds, while instead
> ensuring that the expressive power of language became comparable to that of
> images.
>
> Ciao,
> Stefano
>
> Il dom 8 feb 2026, 13:13 John Clark <johnkclark at gmail.com> ha scritto:
>
>> On Sat, Feb 7, 2026 at 12:14 PM Stefano Ticozzi <
>> stefano.ticozzi at gmail.com> wrote:
>>
>> *> Scientific thought has long since moved beyond Platonism,*
>>>
>>
>> *Philosophical thought perhaps, but scientific thought never embraced
>> Platonism because the most famous of the ancient Greeks were good
>> philosophers but lousy scientists. Neither Socrates, Plato or Aristotle
>> used the Scientific Method. Aristotle wrote that women had fewer teeth than
>> men, it's known that he was married, twice in fact, yet he never thought of
>> just looking into his wife's mouth and counting. Today thanks to AI, for
>> the first time some very abstract philosophical ideas can actually be
>> tested scientifically. *
>>
>> *> 1. Ideas do not exist independently of the human mind.  Rather, they
>>> are constructs we develop to optimize and structure our thinking.*
>>>
>>
>> *True but irrelevant.  *
>>
>>
>>> *> 2. Ideas are neither fixed, immutable, nor perfect; they evolve over
>>> time, as does the world in which we live—in a Darwinian sense. For
>>> instance, the concept of a sheep held by a human prior to the agricultural
>>> era would have differed significantly from that held by a modern
>>> individual.*
>>>
>>
>> *The meanings of words and of groups of words evolve over the eons in
>> fundamental ways, but camera pictures do not.  And yet minds educated by
>> those two very different things become more similar as they become smarter.
>> That is a surprising revelation that has, I think, interesting
>> implications. *
>>
>> *> In my view, the convergence of AI “ideas” (i.e., language and visual
>>> models) is more plausibly explained by a process of continuous
>>> self-optimization, performed by systems that are trained on datasets and
>>> information which are, at least to a considerable extent, shared across
>>> models.*
>>>
>>
>> *Do you claim that the very recent discovery that the behavior of minds
>> that are trained exclusively by words and minds that are trained
>> exclusively by pictures are similar and the discovery that the smarter
>> those two minds become the greater the similarities, has no important
>> philosophical ramifications? *
>>
>> *John K Clark    See what's on my new list at  Extropolis
>> <https://groups.google.com/g/extropolis>*
>>
>> 4x@
>>
>>
>>
>>
>>
>>
>>
>>>
>>> Il sab 7 feb 2026, 12:57 John Clark via extropy-chat <
>>> extropy-chat at lists.extropy.org> ha scritto:
>>>
>>>> *Why do the language model and the vision model align? Because they’re
>>>> both shadows of the same world*
>>>> <https://www.quantamagazine.org/distinct-ai-models-seem-to-converge-on-how-they-encode-reality-20260107/?mc_cid=b288d90ab2&mc_eid=1b0caa9e8c>
>>>>
>>>> *The following quote is from the above: *
>>>>
>>>> *"More powerful AI models seem to have more similarities in their
>>>> representations than weaker ones. Successful AI models are all alike, and
>>>> every unsuccessful model is unsuccessful in its own particular way.[...] He
>>>> would feed the pictures into the vision models and the captions into the
>>>> language models, and then compare clusters of vectors in the two types. He
>>>> observed a steady increase in representational similarity as models became
>>>> more powerful. It was exactly what the Platonic representation hypothesis
>>>> predicted."*
>>>>
>>>> _______________________________________________
> extropy-chat mailing list
> extropy-chat at lists.extropy.org
> http://lists.extropy.org/mailman/listinfo.cgi/extropy-chat
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20260209/182de6ff/attachment.htm>