[ExI] Future of Humanity Institute at Oxford University £1 million grant for AI

Mon Jul 6 18:44:13 UTC 2015

> On Jul 5, 2015, at 12:20 PM, Anders Sandberg <anders at aleph.se> wrote:
> From: J. Andrew Rogers <andrew at jarbox.org> 
> 
> Core computer science is advancing rapidly but little of it is occurring in academia or is being published. There have been many large discontinuities in computer science over the last several years evidenced by their effects that have largely gone unnoticed because it was not formally published. 
> 
> Would you be so kind to mention some of these?

Hi Anders,

Here are three examples that I have experience with:

- Parallel graph traversal not based on sparse, synchronous models. The best algorithms are dense, asynchronous models using topological techniques. Local traversals can run free but still produce the same global result as if you globally synchronized traversal depth. This is the organizational basis of some new ultra-fast index-less database kernels that should become common over the next few years.

- Software that can teach itself to design novel, state-of-the-art algorithms. For example, the world’s best hash function engineer is a piece of software I designed in 2012 (see URL below). Its internal model of how to design hash functions was constructed by iterative and extraordinarily expensive high-order algorithmic induction. People are using this technique to construct algorithm designers for other hard problems in commercial software. 

http://www.jandrewrogers.com/2015/05/27/metrohash/ <http://www.jandrewrogers.com/2015/05/27/metrohash/>

- Representation of the physical world on computers i.e. dynamic indexing of shapes and paths at scale. Until 2007, we had no idea how to do this (see URL below). Few people know how SpaceCurve’s implementation works but it is multiple orders of magnitude beyond the scalability and performance of any other software for what it does. Popular for continuous analysis of population behavior because nothing else can.

http://www.jandrewrogers.com/2015/03/02/geospatial-databases-are-hard/ <http://www.jandrewrogers.com/2015/03/02/geospatial-databases-are-hard/>

The first and last examples both fall into the class of algorithm problems that do not have tractable solutions in graph-like software representations. In recent years, a scalable computing model was discovered that solves this class of problems. (In this model, your primitives are hyper-rectangles embedded in an even higher dimensionality surface; algorithms are executed on and between surfaces via logical quasi-homomorphisms. Extremely efficient and parallelizable.) Really interesting software is being designed this way by a handful of people at a handful of companies, the qualitative improvements currently being used primarily to arbitrage existing markets.

Of specific relevance to AI, the core algebra inherently has strongly compressive characteristics that makes it ideal for induction, clustering, etc while being largely insensitive to dimensionality. I do not know if anyone is trying to solve for this specifically at the moment but a good implementation should put current “deep learning” stacks to shame across every metric that matters.

> Any particular references for this?
> 
> Full disclosure: Owain is actually my flatmate *and* colleague, and we have been discussed this project at some length. What he is actually planning to do with the Stanford team seems to be rather different from current recommender and preference inference systems (yes, there has been a fair bit of literature and tech review involved in writing the grant). While there are certainly behavioural economics models out there, I have not seen any generative modelling.

Commercially people are already fusing real-time mobile and other telemetry, social media, remote sensing, environmental, etc data — basically everything that can be measured about an entity and its environment — into a single spatiotemporal model logged over several years, limited only by storage budget. The first attempt at population-scale behavioral and preference reconstruction solely from this data was by a friend in 2012. The capability has progressed considerably since then. It is both pretty amazing and disconcerting.

An important outcome has been the realization that the models inductively constructed from population-scale all-source data do not match results from virtually all studies on behavior and preferences with small populations, narrow data sources, short time windows, or aware participants. There are significant technology, data access, and regulatory hurdles to generating good models of humanity, hence my “non-trivial” comment.

Full disclosure: I designed the largest such extant systems. The biggest single systems today cover ~5% of the human population and could scale to all of it. It mostly blows peoples’ minds that it is even possible. I believe there is a Wall Street Journal article on the technology slated to be published this summer.

> Doing these types of studies in a way that produces robust and valid results is beyond non-trivial and highly unlikely to be achieved by someone who is not already an expert at real-world behavioral induction, which unfortunately is the case here. 
> 
> Hmm, just checking: are you an expert on judging the expertise of the different teams? How well do you know their expertise areas?

I was careful to only comment on areas where I have deep domain expertise. I do not know the expertise of the teams per se but the community of people doing the current state-of-the-art in areas I am familiar with is not large. If I do not know the people then I am often familiar with the work occurring at their affiliated organization. 

One of the reasons I bothered to read the list was to see if there was intersection with people I know working on technology that I suspect will be relevant (there weren’t).

> The sentence
> 
>  The absence of people doing relevant advanced computer science R&D in the list is going to produce some giant blind spots in the aggregate output.
> 
> seems to suggest that you do not know the CVs of the teams very well.

I know the CVs of some of the people, others not so much. My comment was more about what was apparently missing from the set of CVs. 

> It wouldn't surprise me if the majority of funded projects are duds. Most science is. But the aim is a bit subtle: to actually kickstart the field of beneficial AI, and that involves meshing several disciplines and luring in more standard research too - there is a fair bit or related stuff in other research programs that is not visible from the list. In the end, the real success will be if it triggers long-term research collaborations that can actually solve the bigger problems.

Sure, I was in no way trying to impugn the effort, it obviously has merits. I was making the observation that based on what I know about some of those domains, the process of acquiring the necessary expertise needs some work. Anecdotally, I never even heard of this particular RFP until I saw the results of it.

-jar

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20150706/cdadfa9e/attachment.html>