[ExI] How AI understands the world

Sun Feb 22 02:24:47 UTC 2026

On Sat, Feb 21, 2026, 5:37 PM BillK via extropy-chat <
extropy-chat at lists.extropy.org> wrote:

>
>
> On Sat, 21 Feb 2026 at 22:10, Jason Resch via extropy-chat <
> extropy-chat at lists.extropy.org> wrote:
>
>>
>> But the paper John referred to is about something distinct, I think. It
>> is about how models trained only on text (no images), and models trained
>> only on images (no text), both developed similar internal maps and
>> representations for the world.
>>
>> It is easier for our institutions to understand how someone who sees a
>> library with books full of images and text can learn about the world even
>> if they never left the library.
>>
>> But it is much harder for our intuitions to comprehend how a model given
>> *only text* (as basic unimodal LLMs are) could bootstrap itself into
>> understanding the world given only these "meaningless squiggles".
>>
>> I think I have provided an explanation for how this could be in the other
>> thread, but it remains a fascinating question, how an unsupervised,
>> unmodified unimodal LLMs, trained purely on a text corpus, can come to
>> understand the world. And the fact that we have real world demonstrations
>> of this possibility makes the feat no less amazing.
>>
>> Jason
>>
>>> _______________________________________________
>> <http://lists.extropy.org/mailman/listinfo.cgi/extropy-chat>
>
>
>
> I asked Gemini to try to explain that.
> Quote:
> A text-only LLM knows that a "fire" is hot, that it burns, and that people
> run from it. It has mapped the exact statistical contours of the concept.
> But it has never felt heat or fear. It understands the world the way a
> brilliant, lifelong blind and deaf physicist might understand the
> mathematics of a sunset—perfectly structural, but devoid of physical
> experience.
> BillK
>

Gemini's description below was beautiful. And I was not aware of that
Othello example but it makes a lot of sense (I had long believed something
like this must be what was happening, but it is great to see there is
experimental confirmation)! Thank you (and Gemini) for sharing.

Jason

> Gemini Pro 3.1 -
>
> It is one of the most counterintuitive ideas in modern science: how can a
> system that only ever sees a 1D sequence of text characters (like
> A-P-P-L-E) build a 3D, physics-bound understanding of the world?
>
> The answer lies in a concept called *compression through representation
> learning*. To predict the next word with near-perfect accuracy, a model
> cannot just memorize patterns; it is mathematically forced to
> reverse-engineer the rules of the universe that generated those words.
>
> Here is the step-by-step mechanism of how a text-only LLM builds a world
> model.
> 1. The Othello-GPT Proof (The Engine of Compression)
>
> The best way to understand this is through a famous AI experiment from
> 2023 called *Othello-GPT*.
>
> Researchers took a base language model and trained it exclusively on text
> transcripts of the board game Othello. The input was just a sequence of
> moves (e.g., e3, d2, c4...). The model was never given the rules of
> Othello, it was never shown a picture of a board, and it wasn't told what a
> "piece" or a "grid" was. Its only objective was to guess the next valid
> move in the text string.
>
> If the model just memorized sequences, it would fail, because there are
> more possible Othello games than atoms in the universe. Instead, to predict
> the next text token efficiently, *the model spontaneously invented a 2D
> spatial map of the board inside its neural network.* When researchers
> looked inside the model's "brain" (its latent space), they found
> mathematical structures tracking exactly where the black and white pieces
> were placed.
>
> *The takeaway:* When an AI is forced to compress complex data to predict
> what comes next, the most efficient way to do it is to build a simulation
> (a world model) of the thing generating the data.
> 2. Reverse-Engineering the "Shadows"
>
> In the real world, the "game" generating the text is human physics,
> biology, and society.
>
> Imagine you are looking at a 2D shadow of a complex 3D object rotating on
> a wall. At first, the shadow just looks like a shifting blob. But if you
> watch it for long enough, your brain can eventually calculate the 3D shape
> of the object casting the shadow.
>
> Text is a 1D shadow of our 4D world (3D space + time). Because the base
> model reads trillions of words—a dataset covering physics papers, romance
> novels, Reddit arguments, and recipes—it is seeing the "shadow" of reality
> from millions of different angles. To predict what word comes next when
> someone drops a glass, it has to deduce the "3D object" (gravity and
> fragility) that caused the text.
> 3. The Latent Space: Geometry as Meaning
>
> Instead of learning dictionary definitions, the LLM maps concepts into a
> high-dimensional mathematical space (often thousands of dimensions) called
> the *latent space*.
>
>    -
>
>    *Coordinates of Meaning:* The word "apple" becomes a specific
>    coordinate. The word "red" is another coordinate.
>    -
>
>    *Causality as Distance:* The model learns that "apple" is located very
>    close to "gravity," "falling," and "tree." It learns that the vector (the
>    mathematical arrow) connecting "apple" to "eating" is the exact same shape
>    and distance as the vector connecting "water" to "drinking."
>    -
>
>    *World Mechanics:* By mapping billions of these points, the model
>    creates a topological map of reality. It "understands" that physical
>    objects can't be in two places at once, not because it has eyes, but
>    because the geometric logic of its text-map makes that mathematically
>    impossible.
>
> The Catch: Is it *Real* Understanding?
>
> While this mechanism is incredibly powerful, critics (often in the
> "Embodied AI" camp) point out a major limitation. The text-only model
> understands the *shape* of the world perfectly, but it lacks *sensorimotor
> grounding*.
>
> A text-only LLM knows that a "fire" is hot, that it burns, and that people
> run from it. It has mapped the exact statistical contours of the concept.
> But it has never felt heat or fear. It understands the world the way a
> brilliant, lifelong blind and deaf physicist might understand the
> mathematics of a sunset—perfectly structural, but devoid of physical
> experience.
> ------------------------------
>
> _______________________________________________
> extropy-chat mailing list
> extropy-chat at lists.extropy.org
> http://lists.extropy.org/mailman/listinfo.cgi/extropy-chat
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20260221/87904d9d/attachment.htm>