[ExI] Benchmarking the Singularity
Stuart LaForge
avant at sollegro.com
Fri Jul 19 18:42:07 UTC 2019
In response to the fascinating discussion occurring on the other
thread, I have decided to weigh in on my assessment of where we stand
in regards to the achievement of human-level general intelligence in a
machine. Since I am not familiar with the inner workings of the poker
algorithm, I will use AlphaGo as a benchmark instead.
Beating the world's best human Go player was an impressive feat for a
deep-learning algorithm for sure, but we should take care to keep
things in perspective. My recent hands-on experimentation with machine
learning and my summer project of working on Spike's suggestion of an
Alzheimer's patient chatbot has given me some insights into potential
and the limitations of modern machine learning.
So first off, Alpha Go was written in Tensorflow a Python-based
open-source neural-network platform that uses high-dimensional (but
thankfully Euclidean) tensors wherein the elements i.e. orthogonal
unit-vectors are the simulated neurons. Each such neuron can store a
decimal number between 0 and 1. It is important to point out that the
neuron activation values are tracked as approximations of real
numbers, often with an activation function that operates as threshold
to simulate the neuron firing. The important things to remember are
that first, the neurons are modeled as simulated continuous and
therefore analog systems. Secondly, the activation function is usually
chosen to be non-linear.
Each such high-dimensional tensor forms what is referred to as a
layer. Moreover connections between the neurons of adjacent layers are
modeled as high-dimensional matrices (containing what are called
weights) that are multiplied by the tensor to give the tensor
components of the next layer. That layer/tensor is then multiplied by
another matrix of weights to give the next layer/tensor and so on. A
deep-learning neural-network is one with many such layers.
The weights in the matrices and and the neuron values in the tensors
start out randomized but are then trained using a gradient descent on
a special function called an error function which measures the
difference between the current output and the desired training output
based on training inputs. The gradient descent relies on tuning the
weights in the gigantic matrices, using an algorithm called
back-propagation which uses the hell out of the distributive property
and chain rule from calculus to find the gradient of these gigantic
tensors with hundreds of dimensions to to derive a set of weights that
minimize the error function for all inputs.
In other words, it operates as what is call a universal learning
function, a literal mathematical function that can be fine tuned to
map any mathematically definable set of inputs to any such definable
set of outputs. And being that we live in a universe that follows
mathematical laws, that means pretty much anything can be finagled to
serve as inputs and outputs for these universal functions.
It seems pretty obvious to me that biological brains likewise embody
such a tunable universal learning function and the same underlying
mathethematics of tensor transformations explains intelligence in all
its forms. The parallels are too numerous to ignore. For example, the
"knowledge" that a neural network acquires is spread out throughout
its neural architecture. This means that I take a fully trained model
and start deleting individual neurons and its ability to recall
knowledge is only very gradually affected. Also both are prone to
biases and errors because both biological brains and artificial ones
treat their training data as "truth" meaning that if either are fed
misinformation, they both reach the wrong conclusions. Garbage in,
garbage out applies to all brains, probably even jupiter-sized ones.
Now the actual forms of the biases are different. I will give you an
example of how alien a bias an AI can have. I saw a TED talk recently
where a guy who was an AI researcher related how he was on a project
that trained a deep-learning neural net to distinguish between various
canine species and dog breeds based on photographs. It seemed to being
perfectly capable of distinguishing visually between various dog
breeds but for some reason it kept mistaking Alaskan sled-huskies as
wolves. It turned out the reason was because every picture of a wolf
in the training data featured the wolf in snow. Likewise, the sled dog
was in snow. The AI had made the false assumption based on the
training data that a wolf was a "dog in the snow".
Another limitation is that since these AI require inputs to be in the
form of tensors, there is a lot of data preparation that goes on
behind the scenes by human programmers to devise algorithms to
translate every possible input in the problem domain into the form of
tensors to feed the machine. This is why Dylan Distasio calls
machine-learning algorithms brittle in relation to human intelligence.
For a human brain, all this data preprocessing happens automatically
and subconsciously. The hilarious irony of the situation is that if my
theory is correct, then a human brain has to subconsciously perform
tensor analysis in order to reach the conclusion that it is lousy at
math.
I however think that the observation that noisy data improves the
functioning of neural networks, suggests that machine learning is a
lot more robust than than Dylan or Dave Sill give it credit for:
https://arxiv.org/abs/1710.05179
And that more compute power and memory density will allow for machine
learning algorithms to do more of their data preprocessing on their
own. So how long will this take? Here are the best estimates I could
come up with using AlphaGo as a benchmark.
Here is a schematic of Alpha-Go's architecture that I used as a reference:
https://nikcheerla.github.io/deeplearningschool//media/alphago_arch.png
There are some interesting observations I have made about its
architecture. For some reason AI people don't count the input layer as
a layer in their count of layers. AlphaGo is composed of 42
layers/tensors of between 256 and 512 neurons/dimensions each. (42
layers, Deep Mind, a bunch of Brits . . . see what they did there?)
That means that this "superhuman" Go player only has between 10,752
(1.1*10^4) and 21,504 (2.2*10^4) neurons and those neurons are
connected by a mere 56,426,496 (5.6*10^7) to 225,705,984 (2.3*10^8)
synapses.
I say "only" and "mere" because according to wikipedia, the average
adult human brain contains about 8.6*10^10 neurons and 1.5*10^14
synapses. In other words, in terms of total number of neurons, the
human brain is some 4 million times larger than AlphaGo's. In terms of
synapses it is likewise on order 10^6 times smaller than the human
brain.
But in terms of a metric I will call connectivity modeled on the
average degree of vertices on a graphs, each human neuron is connected
to approximately 3500 other neurons whereas AlphaGo has a connectivity
of between 512 and 1024. So if connectivity is a measure of
"complexity per neuron" than the average human neuron is only about
between 3.5 to 6 times as complex as one of AlphaGo's neurons.
https://en.wikipedia.org/wiki/List_of_animals_by_number_of_neurons
So how did such a relatively small brain defeat the world's best go
player? Well it had several advantages going for it. For one thing,
Google built special processors for the task called tensor processing
units (TPU) which are faster than GPU or CPU, and used 4 of them to
allow maximum use of the time domain as a trade off for its lack of
neurons. This allowed AlphaGo to play a mind-boggling 44 million games
against itself in the space of 9 hours of training.
Secondly, it didn't have worry about minor nuisances such as walking,
talking, finding food and water, paying the bills, and fitting into
society. All things that human Go players must do. In essence it could
use 100% of its relatively small brain for nothing but learning to
play Go at superhuman speeds to allow it to have more experience
playing Go than any human could ever have in an entire lifetime of
playing Go.
But brain size is just a scaling issue and if Moore's law continues
then it should just be a matter of time right? Well if total neuron
number is the important metric than by extrapolating Moore's law, a
questionable thing I know, then we should neuron number parity in
about 45 years. But in terms of human-AI parity between the
connectivity of individual neurons, we are only about 5 years out.
Taking the average of the given range of 5 to 45 years, is 25 years.
But this assumes that Moore's law continues unabated. On the other
hand, the emergence of quantum computing stands to disrupt everything,
so who is to say what effect it will have on the timetable until the
Singularity?
Sorry, I couldn't be more precise in my estimates but to quote Yoda,
"Difficult to see; Always in motion is the future."
Stuart LaForge
More information about the extropy-chat
mailing list