[ExI] alpha zero

Sun Dec 10 17:08:24 UTC 2017

On Behalf Of John Clark
Subject: Re: [ExI] alpha zero

On Sun, Dec 10, 2017 at 10:48 AM, spike <spike66 at att.net <mailto:spike66 at att.net> > wrote:

> >…Ja of course this is impressive, but consider all the ways this could be achieved that would look like it had trained itself from nothing in a day.  An eager press corps could report the program was given nothing but the rules of chess, when in reality it was given the StockFish chess engine with no opening book.  That would constitute being given chess rules only, if the phrase is interpreted broadly.  In fact, that would be a good approach to the problem: StockFish code is highly optimized already, so there is no need to reinvent that wheel. 

>…I don't think that's what they did but if it was it would be just as impressive… John K Clark

On the contrary sir.

Once chess software achieved a certain level, it didn’t much matter how good it is: an ordinary consumer-level person cannot challenge it.  I see little point in paying for 3345 Elo software if 3317 software is free and open-source.  However… if DeepMind really has something that can learn from self-play, that software is worth jillions of dollars.  People who don’t care about chess that much would buy it and experiment with it.  I would.

Here’s what I am doing: reading carefully what the DeepMind paper claims they did, and comparing with what the press is reporting.  If you have time to blow on what could be the most important development in singularity theory since Eliezer left the ExI list, do a Google search on DeepMind Chess AlphaZero and look at the various articles.  Note that they are contradictory in some ways and many of the articles make claims that the DeepMind paper doesn’t make exactly.  The tech-press seems to have engaged in some examples of what I see here, hopeful thinking.  I did it myself: I hope it is right, I hope a computer figured out how to perform 160 Elo above the current version of StockFish and did it entirely by self-training.  But I suspect we don’t yet have the whole story.

That said, I would give DeepMind a few hundred bucks for that software based on what I think it did.  I might give them a couple thousand if they would show me their source code.  If we can come to understand the principles we think they used for self-play learning, it should be applicable to any zero-sum game.  It might even be possible to extend the paradigm to optimization games, gambling games and non-zero-sum games such as Diplomacy.

spike

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20171210/2af43a41/attachment.html>