[ExI] Benchmarking the Singularity

Fri Jul 19 18:42:07 UTC 2019

In response to the fascinating discussion occurring on the other  
thread, I have decided to weigh in on my assessment of where we stand  
in regards to the achievement of human-level general intelligence in a  
machine. Since I am not familiar with the inner workings of the poker  
algorithm, I will use AlphaGo as a benchmark instead.

Beating the world's best human Go player was an impressive feat for a  
deep-learning algorithm for sure, but we should take care to keep  
things in perspective. My recent hands-on experimentation with machine  
learning and my summer project of working on Spike's suggestion of an  
Alzheimer's patient chatbot has given me some insights into potential  
and the limitations of modern machine learning.

So first off, Alpha Go was written in Tensorflow a Python-based  
open-source neural-network platform that uses high-dimensional (but  
thankfully Euclidean) tensors wherein the elements i.e. orthogonal  
unit-vectors are the simulated neurons. Each such neuron can store a  
decimal number between 0 and 1. It is important to point out that the  
neuron activation values are tracked as approximations of real  
numbers, often with an activation function that operates as threshold  
to simulate the neuron firing. The important things to remember are  
that first, the neurons are modeled as simulated continuous and  
therefore analog systems. Secondly, the activation function is usually  
chosen to be non-linear.

Each such high-dimensional tensor forms what is referred to as a  
layer. Moreover connections between the neurons of adjacent layers are  
modeled as high-dimensional matrices (containing what are called  
weights) that are multiplied by the tensor to give the tensor  
components of the next layer. That layer/tensor is then multiplied by  
another matrix of weights to give the next layer/tensor and so on. A  
deep-learning neural-network is one with many such layers.

The weights in the matrices and and the neuron values in the tensors  
start out randomized but are then trained using a gradient descent on  
a special function called an error function which measures the  
difference between the current output and the desired training output  
based on training inputs. The gradient descent relies on tuning the  
weights in the gigantic matrices, using an algorithm called  
back-propagation which uses the hell out of the distributive property  
and chain rule from calculus to find the gradient of these gigantic  
tensors with hundreds of dimensions to to derive a set of weights that  
minimize the error function for all inputs.

In other words, it operates as what is call a universal learning  
function, a literal mathematical function that can be fine tuned to  
map any mathematically definable set of inputs to any such definable  
set of outputs. And being that we live in a universe that follows  
mathematical laws, that means pretty much anything can be finagled to  
serve as inputs and outputs for these universal functions.

It seems pretty obvious to me that biological brains likewise embody  
such a tunable universal learning function and the same underlying  
mathethematics of tensor transformations explains intelligence in all  
its forms. The parallels are too numerous to ignore. For example, the  
"knowledge" that a neural network acquires is spread out throughout  
its neural architecture. This means that I take a fully trained model  
and start deleting individual neurons and its ability to recall  
knowledge is only very gradually affected. Also both are prone to  
biases and errors because both biological brains and artificial ones  
treat their training data as "truth" meaning that if either are fed  
misinformation, they both reach the wrong conclusions. Garbage in,  
garbage out applies to all brains, probably even jupiter-sized ones.

Now the actual forms of the biases are different. I will give you an  
example of how alien a bias an AI can have. I saw a TED talk recently  
where a guy who was an AI researcher related how he was on a project  
that trained a deep-learning neural net to distinguish between various  
canine species and dog breeds based on photographs. It seemed to being  
perfectly capable of distinguishing visually between various dog  
breeds but for some reason it kept mistaking Alaskan sled-huskies as  
wolves. It turned out the reason was because every picture of a wolf  
in the training data featured the wolf in snow. Likewise, the sled dog  
was in snow. The AI had made the false assumption based on the  
training data that a wolf was a "dog in the snow".

Another limitation is that since these AI require inputs to be in the  
form of tensors, there is a lot of data preparation that goes on  
behind the scenes by human programmers to devise algorithms to  
translate every possible input in the problem domain into the form of  
tensors to feed the machine. This is why Dylan Distasio calls  
machine-learning algorithms brittle in relation to human intelligence.

For a human brain, all this data preprocessing happens automatically  
and subconsciously. The hilarious irony of the situation is that if my  
theory is correct, then a human brain has to subconsciously perform  
tensor analysis in order to reach the conclusion that it is lousy at  
math.

I however think that the observation that noisy data improves the  
functioning of neural networks, suggests that machine learning is a  
lot more robust than than Dylan or Dave Sill give it credit for:   
https://arxiv.org/abs/1710.05179

And that more compute power and memory density will allow for machine  
learning algorithms to do more of their data preprocessing on their  
own. So how long will this take? Here are the best estimates I could  
come up with using AlphaGo as a benchmark.

Here is a schematic of Alpha-Go's architecture that I used as a reference:

https://nikcheerla.github.io/deeplearningschool//media/alphago_arch.png

There are some interesting observations I have made about its  
architecture. For some reason AI people don't count the input layer as  
a layer in their count of layers. AlphaGo is composed of 42  
layers/tensors of between 256 and 512 neurons/dimensions each. (42  
layers, Deep Mind, a bunch of Brits . . . see what they did there?)  
That means that this "superhuman" Go player only has between 10,752  
(1.1*10^4) and 21,504 (2.2*10^4) neurons and those neurons are  
connected by a mere 56,426,496 (5.6*10^7) to 225,705,984 (2.3*10^8)  
synapses.

I say "only" and "mere" because according to wikipedia, the average  
adult human brain contains about 8.6*10^10 neurons and 1.5*10^14  
synapses. In other words, in terms of total number of neurons, the  
human brain is some 4 million times larger than AlphaGo's. In terms of  
synapses it is likewise on order 10^6 times smaller than the human  
brain.

But in terms of a metric I will call connectivity modeled on the  
average degree of vertices on a graphs, each human neuron is connected  
to approximately 3500 other neurons whereas AlphaGo has a connectivity  
of between 512 and 1024. So if connectivity is a measure of  
"complexity per neuron" than the average human neuron is only about  
between 3.5 to 6 times as complex as one of AlphaGo's neurons.

https://en.wikipedia.org/wiki/List_of_animals_by_number_of_neurons

So how did such a relatively small brain defeat the world's best go  
player? Well it had several advantages going for it. For one thing,  
Google built special processors for the task called tensor processing  
units (TPU) which are faster than GPU or CPU, and used 4 of them to  
allow maximum use of the time domain as a trade off for its lack of  
neurons. This allowed AlphaGo to play a mind-boggling 44 million games  
against itself in the space of 9 hours of training.

Secondly, it didn't have worry about minor nuisances such as walking,  
talking, finding food and water, paying the bills, and fitting into  
society. All things that human Go players must do. In essence it could  
use 100% of its relatively small brain for nothing but learning to  
play Go at superhuman speeds to allow it to have more experience  
playing Go than any human could ever have in an entire lifetime of  
playing Go.

But brain size is just a scaling issue and if Moore's law continues  
then it should just be a matter of time right? Well if total neuron  
number is the important metric than by extrapolating Moore's law, a  
questionable thing I know, then we should neuron number parity in  
about 45 years. But in terms of human-AI parity between the  
connectivity of individual neurons, we are only about 5 years out.

Taking the average of the given range of 5 to 45 years, is 25 years.  
But this assumes that Moore's law continues unabated. On the other  
hand, the emergence of quantum computing stands to disrupt everything,  
so who is to say what effect it will have on the timetable until the  
Singularity?

Sorry, I couldn't be more precise in my estimates but to quote Yoda,  
"Difficult to see; Always in motion is the future."

Stuart LaForge