[ExI] alpha zero

Dylan Distasio interzone at gmail.com
Thu Dec 7 17:07:30 UTC 2017


Reinforcement learning has proven very powerful in AI.  I have no reason to
believe that they have not accomplished what they claimed.  These results
flow naturally from where the field has been headed, and is not the leap of
faith it first appears to be.  While I am excited to hear the results, and
am impressed,  it is still very brittle compared to generalized
intelligence IMO.   This type of program still needs to be trained on a
very specific problem, and while it is able to be generalized as a
technique, there is no thought process going on behind it.

The program is basically using a combination of deep learning neural nets
and survival of the fitness to replace the best player during simulation
training with any new one that beats the old one by greater than a certain
cutoff (I believe it was 55% in the paper).  You have a massive number of
iterations of simulated gameplay that are minimizing a loss function via
gradient descent (which in a perfect world finds a global minima) that is
keeping the training of the better player in each round.

There have been many remarkable things accomplished with
deep/reinformcement learning.   It's quite startling at first glance to
think that an end goal of minimizing a loss function can generate so much
razzle dazzle, but the math behind these systems is actually not that
complex.  It is essentially matrix multiplication combined with a nonlinear
activation function on the forward pass through a neural network, followed
by gradient descent using calculus to backfeed new weights throughout the
network, and then having the machine play many, many matches against
itself, rinse and repeat.

John-

You may find the ideas at this link interesting based on your last sentence:
https://medium.com/@karpathy/software-2-0-a64152b37c35



On Thu, Dec 7, 2017 at 10:42 AM, spike <spike66 at att.net> wrote:

>
>
>
>
> *From:* extropy-chat [mailto:extropy-chat-bounces at lists.extropy.org] *On
> Behalf Of *John Clark
> *Sent:* Thursday, December 07, 2017 7:16 AM
> *To:* ExI chat list <extropy-chat at lists.extropy.org>
> *Subject:* Re: [ExI] alpha zero
>
>
>
>
>
> On Wed, Dec 6, 2017 at 9:10 PM, spike <spike66 at att.net> wrote:
>
> ​> ​
>
> DeepMind, the same outfit which made the learning Go program is now
> claiming they did the same trick with chess.  I don’t know if I believe it
> (rather I vaguely do not believe it) but it is being reported on a very
> reliable chess site:
>
> https://en.chessbase.com/post/the-future-is-here-alphazero-learns-chess
>
> They are claiming that it learned from only the rules of chess in 24
> hours.  I just don’t see how it could have mastered the collective human
> experience over more than 500 years in 24 hours.
>
> If Deep Mind really did this, it’s the most impressive computer learning
> feat I have ever seen.
>
> >…You're right Spike it's​ simply amazing!
>
>
>
> I still haven’t convinced myself it is true.  I think highly of the source
> that reported it, but they can be fooled.  They played Stockfish, which is
> a very highly respected program with a lotta lotta programmed-in chess
> wisdom.  To figure out all that in a day requires some powerful inference
> activity.  John I am putting myself in the camp of hope it’s true, but
> estimate 70% chance it isn’t.  I don’t know how the hell they did this if
> true.
>
>
>
> >…​ And if you ever hear that it's starting to treat optimizing computer
> code as a game then you may be hearing the opening notes of the
> Singularity. This is big…John K Clark
>
>
>
> Sure and is there any reason why we shouldn’t treat code optimization as a
> game?  It is a clearly-definable goal: we can set the task to give a known
> outcome, give it a time to beat and a memory allocation to beat, may the
> best machine win.  It’s one of those new sports I have been yakking about
> for years, a great example of geek Olympics.
>
>
>
> I want robot gymnastics too.  Whooda thunk that would just appear like it
> has?
>
>
>
> http://www.cnn.com/videos/cnnmoney/2017/11/17/atlas-boston-dynamics-robot-
> backflip-cnntech.cnnmoney
>
>
>
> We could have a code-athlon, where the game is to write the best and most
> efficient code, then let computers play against each other and against
> humans.
>
>
>
> spike
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> extropy-chat mailing list
> extropy-chat at lists.extropy.org
> http://lists.extropy.org/mailman/listinfo.cgi/extropy-chat
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20171207/256841de/attachment.html>


More information about the extropy-chat mailing list