[ExI] all we are is just llms was

Sat Apr 22 10:14:53 UTC 2023

On Sat, Apr 22, 2023, 3:06 AM Gordon Swobe via extropy-chat <
extropy-chat at lists.extropy.org> wrote:

> On Fri, Apr 21, 2023 at 5:44 AM Ben Zaiboc via extropy-chat <
> extropy-chat at lists.extropy.org> wrote:
>
>> On 21/04/2023 12:18, Gordon Swobe wrote:
>
> > Yes, still, and sorry no, I haven't watched that video yet, but I will
>> > if you send me the link again.
>>
>>
>> https://www.youtube.com/watch?app=desktop&v=xoVJKj8lcNQ&t=854s
>>
>>
> Thank you to you and Keith. I watched the entire presentation. I think the
> Center for Human Technology is behind the movement to pause AI development.
> Yes? In any case, I found it interesting.
>
> The thing (one of the things!) that struck me particularly was the
>> remark about what constitutes 'language' for these systems, and that
>> make me realise we've been arguing based on a false premise.
>
>
> Near the beginning of the presentation, they talk of how, for example,
> digital images can be converted into language and then processed by the
> language model like any other language. Is that what you mean?
>
> Converting digital images into language is exactly how I might also
> describe it to someone unfamiliar with computer programming. The LLM is
> then only processing more text similar in principle to English text that
> describes the colors and shapes in the image. Each pixel in the image is
> described in symbolic language as "red" or "blue" and so on. The LLM then
> goes on to do what might be amazing things with that symbolic information,
> but the problem remains that these language models have no access to the
> referents. In the case of colors, it can process whatever
> symbolic representation it uses for "red" in whatever programming language
> in which it is written, but it cannot actually see the color red to ground
> the symbol "red."
>

That was not my interpretation of his description. LLMs aren't used to
process other types of signals (sound, video, etc.), it's the "transformer
model" i.e. the 'T' in GPT.

The transformer model is a recent discovery (2017) found to be adept at
learning any stream of data containing discernable patterns: video,
pictures, sounds, music, text, etc. This is why it has all these broad
applications across various fields of machine learning.

When the transformer model is applied to text (e.g., human language) you
get a LLM like ChatGPT. When you give it images and text you get something
not quite a pure LLM, but a hybrid model like GPT-4. If you give it just
music audio files, you get something able to generate music. If you give it
speech-text pairs you get something able to generate and clone speech (has
anyone here checked out ElevenLabs?).

This is the magic that AI researchers don't quite fully understand. It is a
general purpose learning algorithm that manifests all kinds of emergent
properties. It's able to extract and learn temporal or positional patterns
all on its own, and then it can be used to take a short sample of input,
and continue generation from that point arbitrarily onward.

I think when the Google CEO said it learned translation despite not being
trained for that purpose, this is what he was referring to: the unexpected
emergent capacity of the model to translate Bengali text when promoted to
do so. This is quite unlike how Google translate (GNMT) was trained, which
required giving it many samples of explicit language translations between
one language and another (much of the data was taken from the U.N. records).

Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20230422/66503d4c/attachment.htm>