[ExI] Emily M. Bender — Language Models and Linguistics (video interview)

Jason Resch jasonresch at gmail.com
Sun Mar 26 14:58:16 UTC 2023


On Sun, Mar 26, 2023, 10:05 AM Jason Resch <jasonresch at gmail.com> wrote:

> Hi Gordon,
>
> First I want to thank you again for taking the time to write such a
> thoughtful reply to each of my points below. I include some follow up to
> your responses in-line below.
>
> On Sun, Mar 26, 2023 at 1:01 AM Gordon Swobe <gordon.swobe at gmail.com>
> wrote:
>
>>
>>
>> On Sat, Mar 25, 2023 at 4:49 PM Jason Resch via extropy-chat <
>> extropy-chat at lists.extropy.org> wrote:
>>
>>> Hi Gordon,
>>>
>>> Thanks for sharing this video. I watched and and found the following
>>> points of interest:
>>>
>>> *1. She said they can't possibly be understanding as they are only
>>> seeing a sequence of characters and predicting distributions and what these
>>> models do is not the same thing as understanding language.*
>>> My Reply: These models demonstrate many emergent capabilities that were
>>> not things that were programmed in or planned. They can answer questions,
>>> summarize texts, translate languages, write programs, etc. All these
>>> abilities emerged purely from being trained on the single task of
>>> predicting text. Given this, can we be certain that "understanding" is not
>>> another one of the emergent capabilities manifested by the LLM?
>>>
>>
>> This gets into philosophical debate about what, exactly, are emergent
>> properties. As I understand the term, whatever it is that emerges is
>> somehow hidden but intrinsic prior to the emergence. For example, from the
>> rules of chess there emerge many abstract properties and strategies of
>> chess. To someone naive about chess, it is difficult to imagine from the
>> simple rules of chess how chess looks to a grandmaster, but those emergent
>> properties are inherent in and follow logically from the simple rules of
>> chess.
>>
>
> That is a useful analogy I think. In the same way that an LLM is given a
> corpus of text, and told "Go at it", consider the chess playing AI
> AlphaZero. It was given absolutely no information about chess playing
> strategies, *only the rules of chess*. And yet, within mere hours, it had
> discovered all the common openings:
>
> "In Chess, for example, AlphaZero independently discovered and played
> common human motifs during its self-play training such as openings, king
> safety and pawn structure. But, being self-taught and therefore
> unconstrained by conventional wisdom about the game, it also developed its
> own intuitions and strategies adding a new and expansive set of exciting
> and novel ideas that augment centuries of thinking about chess strategy."
>
> https://www.deepmind.com/blog/alphazero-shedding-new-light-on-chess-shogi-and-go
>
>
> So given only the rules of the game, AlphaZero learned to play chess
> better than any human, and moreover, better than any humans know how to
> program computers to be. To me, this suggests that AlphaZero knows how to
> play chess, and that it understands the game. If you say that AlphaZero
> does not understand chess then I don't know what you mean by "understand,"
> as it must not be the same as the meaning as I use it. Would you say
> AlphaZero does not understand how to play chess?
>
>
>>
>> So how does meaning emerge from mere symbols(words)? Sequences of
>> abstract characters in no possible way contain the seeds of their meanings
>>
>
>
> If you are willing to grant that AlphaZero has learned how to play chess
> merely from the rules of the game, then could an LLM, given only a corpus
> of text, learn anything about the language? For example, could it pick up
> on the structure, grammer, and interrelations of words? Could it learn how
> to form proper sentences and compose meaningful replies in response to
> prompts?
>
> I think you may be missing a crucial piece of understanding about how
> neural networks work. They do not only see sequences of characters. As we
> learned from experiments of Google's deep Dream system, we find that during
> image recognition, each layer of the network sees and looks for something
> different, and as the layers progress, the patterns they look for become
> higher level and more complex. For example, at the lowest layer, it looks
> only for edges. The next layer of the network looks for basic shapes in
> those edges, lines, corners, curves. The next layer looks for and
> recognizes certain combinations of those shapes within a particular region
> to identify parts. The next layer above uses the recognized parts to
> identify what objects are seen. (See:
> https://distill.pub/2017/feature-visualization/ and
> https://blog.google/technology/ai/understanding-inner-workings-neural-networks/
> )
>
> None of this is particularly easy to follow and it took researchers many
> years to even understand what neural networks do when they learn to
> classify images. This is just some of the complex emergent behavior that we
> get when we build networks of millions or billions of neurons and set them
> loose to look for patterns.
>
> I think we could say something similar is happening in these language
> models. True, the lowest layer sees only characters (or tokens). But the
> next layer above that sees collections of tokens and looks for and
> recognizes words. The layer above this might identify particular
> grammatical phrases. The layer above this could recognize and operate on
> sentences. The layer above this on paragraphs, and so on. So it would then
> be incorrect to say that the LLM *only* sees sequences of characters. This
> is true for only a small part of the network, and it ignores the
> higher-level processing done by the later layers and stages of processing.
>
>
>> , as we can see by the fact that many different words exist in different
>> languages and in entirely different alphabets for the same meaning.
>>
>
> That different languages can use different strings of characters to
> represent the same concept only means that any single word in isolation is
> insufficient to decipher the meaning of the word. Given a large enough body
> of text, however, the constraints around any particular word's usage are
> often enough to figure out what it means (we we have done for dead
> languages, and as I showed would be possible if I had the barest
> understanding of mathematics/physics and was given a dictionary or
> wikipedia in another language).
>
> I think your strongest point is how do we bootstrap understanding starting
> from "zero" with absolutely no initial understanding of the world at large.
> This I have difficulty with, because I don't know. But I can try to guess.
> I think the LLM can learn the rules of grammar of a language from enough
> examples, I think also it can learn to pick out words from the string of
> characters. Then, by focusing on patterns of words, think of children's
> books, and short simple common sentences: "a cat is an animal" "a dog is an
> animal" "a turkey is an animal", "a cat is a mammal" "a dog is a mammal" "a
> turkey is a bird", "a cat has ears" a "cat has whiskers" etc. given enough
> of these, a relationship map can be constructed and inferred. It could,
> with enough of these examples, build up sets of classifications, for
> example "cat ∈ mammal ∈ animal" and "turkey ∈ bird ∈ animal" even if it
> doesn't at this stage know what a cat is or a tukey is, it know they are
> both animals. And it knows that cats have ears and tukey's don't, and that
> cats are something called mammals while turkeys are something called birds.
> Its knowledge is woefully incomplete at this stage, but you can see how
> this initial structure lays the foundation for later learning after
> processing more sentences, learning that most birds fly, and therefore
> turkeys probably fly. Learning that mammals nurse their young so cats must
> nurse their young, and so on. The bootstrapping is the hardest part to
> explain, and that it works at all (in humans as we know it does, and in
> LLMs, as it seems they do) is nothing short of a kind of miracle. This is
> not, however, to say it is not explainable or that it is magical, only that
> it's a very complex process we know very little about.
>
>
>
>>
>>
>>> *2. She uses the analogy that the LLM looking at characters would be the
>>> same as a human who doesn't understand Cherokee looking at Cherokee
>>> characters.*
>>> My Reply: This is reminiscent of Searle's Chinese Room. The error is
>>> looking at the behavior of the computer only at the lowest level, while
>>> ignoring the goings-on at the higher levels. She sweeps all possible
>>> behavior of a computer under the umbrella of "symbol manipulation", but
>>> anything computable can be framed under "symbol manipulation" if described
>>> on that level (including what atoms, or neurons in the human brain do).
>>> This therefore fails as an argument that no understanding exists in the
>>> higher-level description of the processing performed by the computer
>>> program.
>>>
>>
>> Yes her argument is similar to Searle's. See above. Sequences of
>> characters (words) in no possible way contain the hidden seeds of their
>> meanings, as we can see from the fact that many different words exist
>> in different languages and alphabets for the same meaning.
>>
>
> I agree a word seen in isolation in now way conveys its meaning. However,
> given enough examples of a word and how it is used, we learn to infer its
> meaning.
>
> Think of the word "wisdom". You know what that word means, but no one has
> ever pointed to a thing and said that thing right there, that's "wisdom".
> Rather, from hundreds or thousands of examples of words phrases, said to
> contain wisdom, you have inferred the meaning of the word. Note that this
> was done merely from the statistical association between the wise words,
> and occasionally seeing the word "wisdom" paired with those words. No
> exemplar of "wisdom" is ever made available to your senses, as "wisdom" is
> an abstract concept which itself exists only in patterns of words.
>
>
>>
>> *3. She was asked what a machine would have to do to convince her they
>>> have understanding. Her example was that if Siri or Alexa were asked to do
>>> something in the real world, like turn on the lights, and if it does that,
>>> then it has understanding (by virtue of having done something in the real
>>> world).*
>>> My Reply: Perhaps she does not see the analogy between turning on or off
>>> a light, and the ability of an LLM to output characters to a monitor as
>>> interacting in the real world (turning on and off many thousands of pixels
>>> on the user's monitor as they read the reply).
>>>
>>
>> I thought that was the most interesting part of her interview. She was
>> using the word "understanding" in a more generous way than I would prefer
>> to use it, even attributing "understanding" to a stupid app like Alexa, but
>> she does not think GPT has understanding. I think she means it in exactly
>> the way I do, which is why I put it in scare-quotes. As she put, it is a
>> "kind of" understanding. As I wrote to you I think yesterday, I will grant
>> that my pocket calculator "understands" how to do math, but it is
>> not holding the meaning of those calculations in mind consciously, which is
>> what I (and most everyone on earth) mean by understanding.
>>
>
> I agree with you here, that her use of "understand" is generous and
> perhaps inappropriate for things like Siri or Alexa. I also agree with you
> that the calculator, while it can do math, I would not say that it
> understands math. Its understanding, if it could be said to have any at
> all, would rest almost entirely in "understanding" what keys have been
> pressed and which circuits to activate on which presses.
>
>
>>
>>
>> Understanding involves the capacity to consciously hold something in
>> mind.
>>
>
> I agree with this definition. But while we both agree on this usage of the
> word, I think I can explain why we disagree on whether LLMs can understand.
> While I am willing to grant LLMs as having a mind and consciousness you are
> not. So even when we use the same definition of "understand," the fact that
> you do not accept the consciousness of LLMs means you are unwilling to
> grant them understanding. Is this a fair characterization?
>
>
>> Otherwise, pretty much everything understands something and the word
>> loses meaning. Does the automated windshield wiper mechanism in my car
>> understand how to clear the rain off my windows when it starts raining? No,
>> but I will grant that it "understands" it in scare-quotes.
>>
>> The other point I would make here is that even if we grant that turning
>> the pixels off and on your screen makes GPT sentient or conscious, the real
>> question is "how can it know the meanings of those pixel arrangements?"
>> From its point of view (so to speak) it is merely generating meaningless
>> strings of text for which it has never been taught the meanings except via
>> other meaningless strings of text.
>>
>> Bender made the point that language models have no grounding, which is
>> something I almost mentioned yesterday in another thread. The symbol
>> grounding problem in philosophy is about exactly this question. They are
>> not grounded in the world of conscious experience like you and me. Or, if
>> we think so, then that is to me something like a religious belief.
>>
>
> Why is it a religious belief to believe LLMs have consciousness, but it is
> not a religious belief to believe that other humans have consciousness? Or
> is it not also a religious believe to believe only humans with their
> squishy brains, but not computers can have minds (*or souls*))?
>
> "I conclude that other human beings have feelings like me, because, first,
> they have bodies like me, which I know, in my own case, to be the
> antecedent condition of feelings; and because, secondly, they exhibit the
> acts, and other outward signs, which in my own case I know by experience to
> be caused by feelings."
> -- John Stewart Mill in "An Examination" (1865)
>
>
> Our assumption that other humans are conscious rests on only two
> observations: (1) other people are made of the same stuff, and (2) other
> people behave as if they are conscious. In the case of LLMs, they are made
> of similar stuff in one sense (quarks and electrons) though they are made
> different stuff in another sense (silicon and plastic), in any case I do
> not ascribe much importance to the material composition, what is important
> to me is do they show evidence of intelligence, understanding, knowledge,
> etc. if they do, then I am willing to grant these entities are conscious. I
> hope that if silicon-based aliens came to earth and spoke to us, that you
> would not deny their consciousness on the basis of (1), and would judge
> their potential for consciousness purely on (2). In the end, the only basis
> we have for judging the existence of other minds is by their behavior,
> which is why the Turing test is about as good as we can do in addressing
> the Problem of Other Minds.
>
>
>
>>
>>
>>
>>> *4. She admits her octopus test is exactly like the Turing test. She
>>> claims the hyper-intelligent octopus would be able to send some
>>> pleasantries and temporarily fool the other person, but that it has no real
>>> understanding and this would be revealed if there were any attempt to
>>> communicate about any real ideas.*
>>> My Reply: I think she must be totally unaware of the capabilities of
>>> recent models like GPT-4 to come to a conclusion like this.
>>>
>>
>> Again, no grounding.
>>
>
> I think it is easy to come to a snap judgement and say there is no
> grounding in words alone, but I think this stems from imagining a word, or
> a sentence in isolation, where every word appears only once, where there is
> only a single example of sentence structure. If however, you consider the
> patterns in a large body of text, you can begin to see the internal
> redundancy and rules begin to show themselves. For example, every word has
> a vowel. Every sentence has a verb. Every word is separated by a space.
> Most sentences have a subject verb and object. All these ideas are implicit
> in the patterns of the text, so we cannot say there is no grounding
> whatsoever, there is obviously this minimum amount of information implicit
> in the text itself. Now ask yourself, might there be more? Might this
> barest level of grounding provide enough to build up the next stage to
> ground further meaning? For example, by observing that certain nouns only
> appear as subjects in association with certain verbs implies certain nouns
> have a certain limited repertoire, a defined potential for action. Or
> observing the usage of words like "is" to infer the sets of properties or
> classifications of particular things.
>
>
>
>>
>>
>>> *5. The interviewer pushes back and says he has learned a lot about
>>> math, despite not seeing or experiencing mathematical objects. And has
>>> graded a blind student's paper which appeared to show he was able to
>>> visualize objects in math, despite not being sighted. She says the octopus
>>> never learned language, we acquired a linguistic system, but the hyper
>>> intelligent octopus has not, and that all the octopus has learned is
>>> language distribution patterns.*
>>> My Reply: I think the crucial piece missing from her understanding of
>>> LLMs is that the only way for them to achieve the levels of accuracy in the
>>> text that they predict is by constructing internal mental models of
>>> reality. That is the only way they can answer hypotheticals concerning
>>> novel situations described to them, or for example, to play chess. The only
>>> way to play chess with a LLM is if it is internally constructing a model of
>>> the board and pieces. It cannot be explained in terms of mere patterns or
>>> distributions of language. Otherwise, the LLM would be as likely to guess
>>> any potential move rather than an optimal move, and one can readily
>>> guarantee a chess board position that has never before appeared in the
>>> history of the universe, we can know the LLM is not relying on memory.
>>>
>>
>> I don't dispute that LLMs construct internal models of reality,
>>
>
> I am glad we are in agreement on this. I think this is crucial to explain
> the kinds of behaviors that we have seen LLMs manifest.
>
>
>> but I cough when you include the word "mental," as if they have minds
>> with conscious awareness of their internal models.
>>
>
> What do you think is required to have a mind and consciousness? Do you
> think that no computer program could ever possess it, not even if it were
> put in charge of an android/root body?
>
>
>
>>
>> I agree that it is absolutely amazing what these LLMs can do and will do.
>> The question is, how could they possibly know it any more than my pocket
>> calculator knows the rules of mathematics or my watch knows the time?
>>
>
> I would say by virtue of having many layers of processing which build up
> to high-level interpretations. Consider that someone could phrase an
> identical sentence:
>
> "I agree that it is absolutely amazing what these *human brains* can do
> and will do. The question is, how could they possibly know it any more than
> a *neuron* knows how to count or a *neocortical column* knows a pattern."
>
>
> Consider: The Java programming language has only 256 instructions. Yet it
> is possible to string these instructions together in a way that it is
> possible to realize any potential program that can be written. Every
> program of the roughly 3,000,000 in the Android Google Play Store is made
> from some combination of these 256 instructions. This is the magic of
> universality. It only takes a few simple rules, added together, to yield
> behaviours of unlimited potential. You can build any logical
> operation/circuit using just the boolean operations of AND, OR, NOT
> (actually it can be done with just a single boolean operation NOT-AND
> <https://en.wikipedia.org/wiki/NAND_gate>). Likewise, any computation can
> be performed just as a series of using *ONLY* multiplication and addition
> operations. This is indeed an incredible property, but it has been proven.
> Likewise, neural networks have been proven universal in another sense, see
> the universal approximation theorem
> <https://en.wikipedia.org/wiki/Universal_approximation_theorem>.
>
> This is why some caution is needed for claims that "A neural network could
> never do this" or "A computer could never do that". Because we already know
> that neural networks and computers are architectures that are sufficiently
> flexible to manifest *any possible behavior* that any machine of any kind
> is capable of manifesting.
>
>
>
>
>>
>>
>>
>>>
>>> *6. The Interviewer asks what prevents the octopus from learning
>>> language over time as a human would? She says it requires joint-attention:
>>> seeing some object paired with some word at the same time.*
>>> My Reply: Why can't joint attention manifest as the co-occurrence of
>>> words as they appear within a sentence, paragraph, or topic of discussion?
>>>
>>
>> Because those other words also have no meanings or refrents. There is no
>> grounding and there is no Rosetta Stone.
>>
>
> But neither is there grounding or a Rosetta Stone when it comes to
> language acquisition by children. You might say: well they receive a visual
> stimulus concurrent with a word. But that too, is ultimately just a
> statistical co-occurrence of ungrounded sensory inputs.
>
>
>>
>> Bender co-authored another paper about "stochastic parrots," which is how
>> she characterizes LLMs and which I like. These models are like parrots that
>> mimic human language and understanding. It is amazing how talented they
>> appear, but they are only parrots who have no idea what they are saying.
>>
>
> I could say Bender is a stochastic parrot, who mimics human language
> understanding, and that I am amazed at how talented she appears, but I am
> willing to attribute to her genuine understanding as evidenced by the
> coherence of her demonstrated thought processes. She should be careful
> though, not to force a test against LLMs which she herself could not pass:
> proving that she has genuine understanding and is not herself a stochastic
> parrot.
>
>
>>
>>
>>>
>>> *7. The interviewer asks do you think there is some algorithm that could
>>> possibly exist that could take a stream of words and understand them in
>>> that sense? She answers yes, but that would require programming in from the
>>> start the structure and meanings of the words and mapping them to a model
>>> of the world, or providing the model other sensors or imagery. The
>>> interviewer confirms: "You are arguing that just consuming language without
>>> all this extra stuff, that no algorithm could just from that, really
>>> understand language? She says that's right.*
>>> My Reply: We already know that these models build maps of things
>>> corresponding to reality in their head. See, for example, the paper I
>>> shared where the AI was given a description of how rooms were connected to
>>> each other, then the AI was able to visually draw the layout of the room
>>> from this textual description. If that is not an example of understanding,
>>> I don't know what possibly could be. Note also: this was an early model of
>>> GPT-4 before it had been trained on images, it was purely trained on text.
>>>
>>
>> This goes back to the question about Alexa.Yes, if that is what you mean
>> by "understanding" then I am forced to agree that even Alexa and Siri
>> "understand" language. But, again, I must put it in scare quotes. There is
>> nobody out there named Alexa who is actually aware of understanding
>> anything. She exists only in a manner of speaking.
>>
>
> This is far beyond what Alexa or Siri do. This proves that words alone are
> sufficient for GPT-4 to construct a mathematical structure (a graph with
> edges and vertices) which are consistent with the layout of rooms within
> the house, as described *purely using words*. This proves that GPT-4 has
> overcome the symbol grounding problem, as it understands exactly, how the
> words map to meaning by creating a mathematical structure consistent with
> the description provided to it.
>
> Please see page 51 of this PDF: https://arxiv.org/pdf/2303.12712.pdf so
> you know what I am talking about. This might be the most important and
> convincing page in the document for the purposes of our discussion.
>
>
>>
>>
>>>
>>> *8. She says, imagine that you are dropped into the middle of the Thai
>>> library of congress and you have any book you could possibly want but only
>>> in Thai. Could you learn Thai? The Interviewer says: I think so. She asks:
>>> What would you first do, where would you start? She adds if you just have
>>> form, that's not going to give you information. She then says she would
>>> have to find an encyclopedia or a translation of a book we know.*
>>> My Reply: We know there is information (objectively) in the Thai
>>> library, even if there were no illustrations or copies of books we had the
>>> translations to. We know the Thai library contains scruitable information
>>> because the text is compressible. If text is compressible it means there
>>> are discoverable patterns in the text which can be exploited to reduce the
>>> amount of bits needed to represent it. All our understanding can be viewed
>>> as forms of compression. For example, the physical laws that we have
>>> discovered "compress" the amount of information we need to store about the
>>> universe. Moreover, when compression works by constructing an internal toy
>>> model of reality, we can play with and permute the inputs to the model to
>>> see how it behaves under different situations. This provides a genuine
>>> understanding of the outer world from which our sensory inputs are based. I
>>> believe the LLM has successfully done this to predict text, it has various
>>> internal, situational models it can deploy to help it in predicting text.
>>> Having these models and knowing when and how to use them, I argue, is
>>> tantamount to understanding.
>>>
>>
>> How could you possibly know what those "discoverable patterns of text"
>> mean, given that they are in Thai and there is no Thai to English
>> dictionary in the Thai library?
>>
>
> Do you agree that the Thai language is compressible? That is to say, if
> you took all the symbols and characters from the Thai library, and let's
> say it came out to 1,000 GB, if we put it into WinZIP it would compress to
> a smaller file, of let's say 200 GB? If you agree that a compression
> algorithm would succeed in reducing the number of bits necessary to
> represent the original Thai text, then this means there are patterns in the
> text which a simple algorithm can discover and exploit to reduce the size
> of the text. More sophisticated algorithms, which are more capable of
> understanding the patterns, will be able to further compress the Thai text.
>
>
>>
>> As she points out and I mentioned above, there is no Rosetta Stone.
>>
>>
> I agree there is no Rosetta Stone. But my point is one is not necessary to
> recognize patterns in text, and build models to predict text. In the same
> way we humans learn to predict the future observations given our current
> ones, a LLM builds a model to predict future text given past text. Its
> model of reality is one of a world of ideas, rather than our world of
> visual and auditory sensations, but to it, it is still a world which it has
> achieved some understanding of.
>
>
>> Thanks for the thoughtful email.
>>
>>
> Likewise. I think even if we do not come to an agreement this is a useful
> discussion in that it helps each of us to clarify our thoughts and
> understanding of these topics.
>
> Jason
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20230326/9dcb7b2b/attachment.htm>


More information about the extropy-chat mailing list