[ExI] The NSA's new data center

David Lubkin lubkin at unreasonable.com
Sun Apr 1 23:40:04 UTC 2012


Eugen wrote:

>I am only talking about what is feasible. NSA is a pretty conservative
>shop, so I don't think they'd be pushing the envelope. They're going to
>have several approaches, including running Oracle on a cluster.
         :
>I wouldn't bother with tape libraries with bulk data. Arguably, I wouldn't
>bother with tape libraries at all but push to remote rotating spindles.

Tape doesn't require energy sitting on a rack, but it does have to
be exercised or data will be lost. Our practice when I was at Livermore
was to move the data to a new tape once a year. And it's useless for
analysis sitting on a rack.

No one has particularly mentioned the world of bigdata.

(Some of you know all this. Some of you don't. So we're all on a
similar page: )

There are an assortment of projects about, some in heavy production
use, to build massive databases and file systems out of commodity
hard disks, with the expectation that at any time something will be
broken.

Google File System keeps files in 64 MB chunks, spread across a
server cluster so that there are at least three copies of each. Individual
files are commonly over 1 GB. Google has been using this for about a
decade. They won't say precisely how much is in it, but it's admitted to
be at least 50,000 servers with tens of petabytes (PB) in all. There are
rumors it's much larger than that.

Then they built Bigtable on top of the file system. Many internal projects
are over 1 TB; at least one of them is over 1 PB.

The Apache Hadoop Distributed File System (HDFS) is an open source
copy of the Google File System. Facebook uses it, and announced in
July they had 30 PB of data in it.

Other projects are similar in scope. Amazon isn't saying what the design
or capacity of S3 is, but they do admit to hosting about a trillion data
objects, each of which could be up to 5 TB. IBM's GPFS has been
tested up to 4 PB.

Given the need to redundantly store data against equipment failure,
state-of-the-art commercial sites today can handle about a thousand
person-years at Anders' figure of 10 TB/year. Facebook could handle
about fifteen people's lifetime surveillance with their current capacity.


-- David.




More information about the extropy-chat mailing list