[ExI] dna to search

Sat Nov 8 05:58:27 UTC 2014

Extropians, your thoughts on this please?  spike

Biomedicine News  <http://www.technologyreview.com/biomedicine/> 

*
<http://www.technologyreview.com/news/532266/google-wants-to-store-your-geno
me/#comments> 3 comments 

Google Wants to Store Your Genome

For $25 a year, Google will keep a copy of any genome in the cloud. 

*	By Antonio Regalado
<http://www.technologyreview.com/contributor/antonio-regalado/>  on November
6, 2014 

Why It Matters

Genome data on millions of people would lead to new medical discoveries and
improved diagnostics. 

Google is approaching hospitals and universities with a new pitch. Have
genomes? Store them with us.

The search giant's first product for the DNA age is Google Genomics, a cloud
computing service that it launched last March but went mostly unnoticed amid
a barrage of high profile R&D announcements from Google, like one late last
month about a far-fetched plan to battle cancer with nanoparticles (see "Can
Google Use Nanoparticles to Search for Cancer
<http://www.technologyreview.com/news/532181/reality-check-for-googles-nanop
article-health-tests/> ?").

Google Genomics could prove more significant than any of these moonshots.
Connecting and comparing genomes by the thousands, and soon by the millions,
is what's going to propel medical discoveries for the next decade. The
question of who will store the data is already a point of growing
competition between Amazon, Google, IBM, and Microsoft.

Google began work on Google Genomics 18 months ago, meeting with scientists
and building an interface, or API, that lets them move DNA data into its
server farms and do experiments there using the same database technology
that indexes the Web and tracks billions of Internet users.

"We saw biologists moving from studying one genome at a time to studying
millions," says David Glazer, the software engineer who led the effort and
was previously head of platform engineering for Google+, the social network.
"The opportunity is how to apply breakthroughs in data technology to help
with this transition."

Some scientists scoff that genome data remains too complex for Google to
help with. But others see a big shift coming. When Atul Butte, a
bioinformatics expert at Stanford heard Google present its plans this year,
he remarked that he now understood "how travel agents felt when they saw
Expedia."

The explosion of data is happening as labs adopt new, even faster equipment
for decoding DNA. For instance, the Broad Institute in Cambridge,
Massachusetts, said that during the month of October it decoded the
equivalent of one human genome every 32 minutes. That translated to about
200 terabytes of raw data.

This flow of data is smaller than what is routinely handled by large
Internet companies (over two months, Broad will produce the equivalent of
what gets uploaded to YouTube in one day) but it exceeds anything biologists
have dealt with. That's now prompting a wide effort to store and access data
at central locations, often commercial ones. The National Cancer Institute
said last month that it would pay $19 million to move copies of the 2.6
petabyte Cancer Genome Atlas into the cloud. Copies of the data, from
several thousand cancer patients, will reside both at Google Genomics and in
Amazon's data centers.

The idea is to create "cancer genome clouds" where scientists can share
information and quickly run virtual experiments as easily as a Web search,
says Sheila Reynolds, a research scientist at the Institute for Systems
Biology in Seattle. "Not everyone has the ability to download a petabyte of
data, or has the computing power to work on it," she says.

Also speeding the move of DNA data to the cloud has been a yearlong price
war between Google and Amazon. Google says it now charges about $25 a year
to store a genome, and more to do computations on it. Scientific raw data
representing a single person's genome is about 100 gigabytes in size,
although a polished version of a person's genetic code is far smaller, less
than a gigabyte. That would cost only $0.25 cents a year.

Cloud storage is giving a boost to startups like Tute Genomics, Seven
Bridges, and NextCode Health. These companies build "browsers" that
hospitals and scientists can use to explore genetic data. "Google or Amazon
is a back end. They are saying, 'Hey, you can build a genomics company in
our cloud,'" says Deniz Kural, CEO of Seven Bridges, which stores genome
data on behalf of 1,600 researchers in Amazon's cloud.

The bigger point, he says, is that medicine will soon rely on a kind of
global Internet-of-DNA which doctors will be able to search. "Our bird's eye
view is that if I were to get lung cancer in the future, doctors are going
to sequence my genome and my tumor's genome, and then query them against a
database of 50 million other genomes," he says. "The result will be 'Hey,
here's the drug that will work best for you.' "

At Google, Glazer says he began working on Google Genomics as it became
clear that biology was going to move from "artisanal to factory-scale data
production." He started by teaching himself genetics, taking an online
class, Introduction to Biology, taught by Broad's chief, Eric Lander. He
also got his genome sequenced and put it on Google's cloud.

Glazer wouldn't say how large Google Genomics is or how many customers it
has now, but at least 3,500 genomes from public projects are already stored
on Google's servers. He also says there's no link, as of yet, between
Google's cloud and its more speculative efforts in health care, like the
company Google started this year, called Calico, to investigate how to
extend human lifespans. "What connects them is just a growing realization
that technology can advance the state of the art in life sciences," says
Glazer.

Somalee Datta, a physicist who manages Stanford University's largest
computer cluster for genetics data, says that because of recent price cuts,
it now costs about the same to store genomes with Google or Amazon as in her
own data center. "Prices are finally becoming reasonable, and we think they
will keep dropping," she says.

Datta says some Stanford scientists have started using a Google database
system, BigQuery, that Glazer's team made compatible with genome data. It
was developed to analyze large databases of spam, web documents, or of
consumer purchases. But it can also quickly perform the very large
experiments comparing thousands, or tens of thousands, of people's genomes
that researchers want to try. "Sometimes they want to do crazy things, and
you need scale to do that," says Datta. "It can handle the scale genetics
can bring, so it's the right technology for a new problem."

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20141107/6c762954/attachment.html>