[ExI] The NSA's new data center

Eugen Leitl eugen at leitl.org
Sun Apr 1 09:21:10 UTC 2012


On Sat, Mar 31, 2012 at 11:37:29PM +0200, Tomasz Rola wrote:

> Keep in mind I am not talking about what is going to be possible. It will 

I am only talking about what is feasible. NSA is a pretty conservative
shop, so I don't think they'd be pushing the envelope. They're going to
have several approaches, including running Oracle on a cluster. 

But, if you're pushing the envelope, you can e.g. put up plenty of
low-power SATA on trays with an ARM SoC + mesh (GBit/s Ethernet or
custom), and hook it up via InfiniBand, and fill up several tennis
courts with these.

> be or it will be not. I believe only in shop inventory and numbers. There 
> is exabyte tape library being sold by Oracle and there is about 20Tbps in 

I wouldn't bother with tape libraries with bulk data. Arguably, I wouldn't
bother with tape libraries at all but push to remote rotating spindles.

> transatlantic cables combined, with plans to extend this by maybe 300% 

Nobody is going to fill up that space overnight. If you want to fill up
things quickly, you wouldn't bother with fiber but synchronize shipping
container data centers locally and ship them. You can fit about 24 k 3.5"
disks inside -- 10 PBytes/day is a lot of bandwidth, though the latency
would suck.

> *which are plans*:
> 
> http://en.wikipedia.org/wiki/Transatlantic_communications_cable
> 
> http://en.wikipedia.org/wiki/Exabyte
> 
> One exabyte library, which has to be maintained (as you said: energy, 
> housing, but also active checking for errors in data made by bit rot). It 

zfs scrubs fine weekly with consumer drives or monthly with SAS.

> will at best store surv data for 100 thousand heads (based on Anders' 
> estimate of 10TB/year/head which is equivalent to 2Mbps a-v stream, if I 
> am right). Two such libraries if you want to have any significant error 
> protection.

You'd do well enough with raidz2 or raidz3 over mirrored pools 
at disk tray level. Remember these are bulk data. The important hot 
spots are all multiply redundant.
 
> Data transmission from sensors in the field to any kind of storage you 

The most abundant sources of data would be from layer 7 interception in the
telco data centers. This is collected near the edge of the network, so easy to
deal with. 

> want, because you do want to store this data somewhere? It is easy to 
> connect cities with hi-speed net, it is easy to create metropolitan 
> network, but it is not easy to deliver hi speed to every place on the map:
> 
> http://en.wikipedia.org/wiki/Last_mile
> 
> The last mile problem remains a problem, no matter if you want to deliver 
> data down to enduser or up from enduser to central hub. There is going to 
> be central hub, either global one or many locals. The problem remains.

You would be running the data mining where the data is, and pool the
results where the analysts are. A smart approach would be to cluster
data spatially and temporally, so that you could limit the query scope
to particular hardware region.



More information about the extropy-chat mailing list