[ExI] e: GPT-4 on its inability to solve the symbol grounding problem

Tue Apr 18 02:43:57 UTC 2023

Hi Ben,

You said: "*There are no pictures of potatoes being sent back and forth in
the brain*. Instead, there are coded signals, in spike trains travelling
along axons."
I believe Geiovani said something similar, when he said there are no pixels
in the brain.
I think I understand what you say in that the name "potato" or "apple" is a
referent to a general abstract idea, rather than a specific potato, which
makes sense.
But what is our subjective knowledge of the potato we see, if not a 3D
model (a picture?), derived through a very complex process, from 2 very
noisy and distorted set of 2 2D pixels, from the eyes?
And when people observe colored  "pictures" in the brian (when they look at
potatoes), and display what they see in the brain, on a picture screen, as
reported in these many papers
<https://canonizer.com/topic/603-Current-Observation-Issues/1-Agreement>,
what are they observing, if not pictures?

On Mon, Apr 17, 2023 at 5:15 PM Ben Zaiboc via extropy-chat <
extropy-chat at lists.extropy.org> wrote:

> On 17/04/2023 22:44, Gordon Swobe wrote:
>
> On Mon, Apr 17, 2023 at 1:58 PM Ben Zaiboc via extropy-chat <
> extropy-chat at lists.extropy.org> wrote:
>
> I suppose that's what I want, a graphical representation
>> of what you mean by 'grounding', incorporating these links.
>
>
> Not sure how to do it incorporating your links. I started scratching my
> head trying to think of the best way to diagram it, then it occured to me
> to ask GPT-4. It certainly "understands" the symbol grounding problem and
> why it cannot solve it for itself. Here is its solution.
>
> Start with three main components:
> a. Sensorimotor experience (perception and action)
> b. Symbolic representation (language, symbols)
> c. Grounding (the process that connects symbols to experience)
>
> etc.
>
>
>
> Well, where's the 'problem' then? All this means is that we match words to
> our experiences. And that's not an extension of my diagram, it's making it
> so abstract that it's almost useless.
>
> I'm going to try again.
>
>
> > The LLM has no way to understand the meaning of the symbol "potato," for
> example -- that is, it has no way to ground the symbol "potato"
>
> What I'm trying to understand is how do we 'ground the symbol' for
> "potato"?
>
> I suspect that you think that we look at a potato, and see a potato (and
> call it "a potato"), and that's the 'grounding'?
>
> The problem is, we don't. Once you become aware of how we construct models
> of objects in our minds, you can start to realise, a bit, how things work
> in our brains. The following is a bit long, but I don't really see how to
> condense it and still explain what I'm on about.
>
>
> (Disclaimer: I am no neurologist. All this is just my understanding of
> what I've read and discovered over the years about how our brains work, in
> a simplified form. Some of it may be inaccurate, some of it may be my
> misunderstanding or oversimplification, and some of it may be flat-out
> wrong. But it seems to make sense, at least to me. If any of you know
> better, or can clarify any points, please speak up)
>
>
> The other night, I was in my bedroom and had a dim table-lamp on, and
> wasn't wearing my glasses. I saw a very odd-looking black shape under my
> desk, and just couldn't figure out what it was. It was literally unknowable
> to me. I was racking my brains trying to figure it out. Rather than getting
> up and finding out, I decided to stay in bed and try to figure it out.
> Eventually I realised, from memory mostly, that there wasn't any thing (or
> any ONE thing) there at all. What I was seeing was two black objects and
> their combined shadows from the lamp, looking from my viewpoint like a
> single object that I'd never seen before.
>
> I think this gives a little bit of an insight into how we construct what
> I'm going to call 'object models' in our minds, from a large assortment of
> sensory data. I'm concentrating on visual data, but many other channels of
> sensory input are also involved.
>
> The data (a LOT of data!) all goes into a set of pattern recognisers that
> try to fit what is being perceived into one or more of a large number of
> stored models.
>
> My brain was trying to create a new object model, from a bunch of data
> that didn't make sense. Only when I realised it wasn't a single object at
> all, but a combination of two black objects and their (unrecognised)
> combined shadows, did things make sense and my brain found a new way to
> recognise a box next to a sketchbook.
>
> This kind of process goes on at a very detailed level, as well. We know a
> fair bit now about how vision works, with specialist subsystems that
> recognise edges oriented at specific angles, certain degrees of contrast,
> etc. ('feature detectors' I believe they're called), which combine
> together, through many layers, and gradually build up more and more
> specific patterns and higher and higher abstractions. We must have a large
> number of these 'object models' stored away, built up since our earliest
> childhood, against which these incoming patterns are checked, to see which
> of them gives a match, and then a kind of darwinian selection process goes
> on to refine the detection until we finally settle on a single object
> model, and decide that that is what we are seeing. Usually, unless someone
> is fucking with us by making us look at those illusions in a book written
> by a psychologist.
>
> We don't look at a potato and 'see a potato', We look at an area in front
> of us, extract a ton of visual information from the scene, detect thousands
> of features, combine them together, carry out a very complex set of
> competing matching operations, which settle down into a concensus that
> links to an object model that links to our language centres that extract a
> symbol that causes us to utter the word "Kartoffel" if we are German, or
> "Potato" if not, etc.
>
> The significant thing here, for our 'grounding' discussion, is the way
> these things are done in the brain. *There are no pictures of potatoes
> being sent back and forth in the brain*. Instead, there are coded
> signals, in spike trains travelling along axons. This is the language of
> the brain, like the language of computers is binary digits sent along
> conductive tracks on circuit boards.
>
> Everything, as far as we currently know, that is transmitted and received
> in all the modules of the brain, is in this 'language' or code, of spike
> trains in specific axons (the exact axon the signal travels along is just
> as important as the actual pattern of action potential spikes. The same
> pattern in a different axon can mean a wildly different thing).
>
> These signals could come from anywhere. This is very important. This spike
> train that, in this specific axon, means "a strong light/dark transition at
> an angle of 50 degrees, coming from cooordinates [x:y] of the right visual
> field", while it usually comes from the optic nerve, could come from
> anywhere. With a bit of technical bio-wizardry, it could be generated from
> a memory location in an array in a computer, created by a text string in a
> segment of program code or memory address. That would have no effect
> whatsoever on the eventual perception in the brain of a potato*. It
> couldn't. A spike train is a spike train, no matter where it came from or
> how it was generated. The only things that matter are which axon it is
> travelling along, and what the pattern of spikes is.
>
> Not only is the matching to existing object models done with this
> language, but the creation of the models in the first place is done in the
> same way. I experienced the beginnings of this in my bedroom. The process
> was aborted, though, when it was decided there was no need for a new model,
> that a combination of two existing ones would fit the requirement.
>
> What if I hadn't realised, though? I'd have a (weak) model of an object
> that didn't really exist! It would probably have faded away quickly, for
> lack of new data to corroborate it, update and refine it. Things like
> apples, though, we are constantly updating and revising our model/s of
> those. Every time we see a new object that can be matched against the
> existing 'apple' model (or 'Granny Smith' model, etc.), we shore it up and
> slightly modify it.
>
> So, what about 'grounding'? These object models in our brains are really
> the 'things' that we are referring to when we say 'potato' or 'apple'. You
> could say that the words are 'grounded' in the object models. But they are
> in our brains! They are definitely not things in the outside world. The
> models are abstractions, generalisations of a type of 'thing' (or really a
> large collection of sensory data) that we've decided makes sense to
> identify as such. They are also changing all the time, as needed.
>
> The information from the outside world, that causes us to bring these
> models to mind, talk about them and even create them in the first place, is
> actually just signals in nerve axons (easily represented as digital
> signals, by the way. Look up "Action potentials" and you'll see why). These
> object models have "no eyes, no ears, no senses whatsoever", to use your
> words (about LMMs). They are entirely reliant on signals that could have
> come from anywhere or been generated in any fashion. Including from strings
> of text or morse code. Are they therefore devoid of meaning? Absolutely
> not! Quite the opposite. They ARE meaning, in its purest sense.
>
> So that's my take on things. And that's what I meant, ages ago, when I
> said "there is no apple". What there is, is an object model (or
> abstraction), in our heads, of an 'apple'. Probably several, really,
> because there are different kinds of apple that we want to disinguish.
> Actually, there will be a whole heirarchy of 'apple object models', at
> various levels of detail, used for different purposes. Wow, there's a LOT
> of stuff in our brains!
>
> Anyway, there is no grounding, there's just associations.
>
> (Note I'm not saying anything about how LMMs work. I simply don't know
> that. They may or may not use something analogous to these object models.
> This is just about how our brains work (as far as I know), and how that
> relates to the concept of 'symbol grounding')
>
> Ben
>
>
> * I should have used a comma there. I didn't mean "perception in the brain
> of a potato", I meant "perception in the brain, of a potato"
>
>
> _______________________________________________
> extropy-chat mailing list
> extropy-chat at lists.extropy.org
> http://lists.extropy.org/mailman/listinfo.cgi/extropy-chat
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20230417/76ce0d9c/attachment.htm>