[ExI] The NSA's new data center

Eugen Leitl eugen at leitl.org
Tue Apr 3 07:39:03 UTC 2012

On Sun, Apr 01, 2012 at 07:40:04PM -0400, David Lubkin wrote:

> Tape doesn't require energy sitting on a rack, but it does have to
> be exercised or data will be lost. Our practice when I was at Livermore

Same thing with drives. You have to scrub periodically (I scrub weekly,
with SAS I'd scrub monthly) in order to combat bit rot.

> was to move the data to a new tape once a year. And it's useless for
> analysis sitting on a rack.

Exactly. You'd keep most of your important data in RAM, and everything
else few 10 ms away. Tape would as well be not there. Besides, it's 
hard to beat 4 TByte/3.5" (eventually, at least 60 TByte/3.5").

> No one has particularly mentioned the world of bigdata.
> (Some of you know all this. Some of you don't. So we're all on a
> similar page: )
> There are an assortment of projects about, some in heavy production
> use, to build massive databases and file systems out of commodity
> hard disks, with the expectation that at any time something will be
> broken.

Many people work with PByte scale data. It's nothing unusual.



> Google File System keeps files in 64 MB chunks, spread across a
> server cluster so that there are at least three copies of each. Individual

Google have specific requirements -- incidentally, some of it could
match the NSA's user profile.

I personally would use a network file system over zfs as back end.

> files are commonly over 1 GB. Google has been using this for about a
> decade. They won't say precisely how much is in it, but it's admitted to
> be at least 50,000 servers with tens of petabytes (PB) in all. There are
> rumors it's much larger than that.
> Then they built Bigtable on top of the file system. Many internal projects
> are over 1 TB; at least one of them is over 1 PB.

Meh, we work with 20 TByte data sets, a small shop of 25 people.
My next personal project is a 10 TByte data set.

> The Apache Hadoop Distributed File System (HDFS) is an open source
> copy of the Google File System. Facebook uses it, and announced in
> July they had 30 PB of data in it.

Just 30 racks. Given how many systems they and Google have, they likely
have a lot more than that.

> Other projects are similar in scope. Amazon isn't saying what the design
> or capacity of S3 is, but they do admit to hosting about a trillion data
> objects, each of which could be up to 5 TB. IBM's GPFS has been
> tested up to 4 PB.
> Given the need to redundantly store data against equipment failure,
> state-of-the-art commercial sites today can handle about a thousand
> person-years at Anders' figure of 10 TB/year. Facebook could handle
> about fifteen people's lifetime surveillance with their current capacity.

Given that NSA already dropped the ball on 9/11 because they focused on
sigint too much they will do nothing like that.

More information about the extropy-chat mailing list