[ExI] The NSA's new data center

Sun Apr 1 19:27:30 UTC 2012

On Sun, 1 Apr 2012, Eugen Leitl wrote:

> On Sat, Mar 31, 2012 at 11:37:29PM +0200, Tomasz Rola wrote:
> 
> > Keep in mind I am not talking about what is going to be possible. It will 
> 
> I am only talking about what is feasible. NSA is a pretty conservative
> shop, so I don't think they'd be pushing the envelope. They're going to
> have several approaches, including running Oracle on a cluster. 

Actually, I was addressing only idea of total surveillance. That NSA has 
been gathering all kind of info, this I realised long ago. And I am a bit 
indifferent to this, because I am not a gangster, not even a politician or 
banker. Spying on me is waste of resources, but who I am to tell them how 
to spend their money :-). I understand they may be somewhere ahead of us 
mortals, but not necessarily in a "revolutionary ahead" manner.

So, all my number juggling relates only to the idea of doing total 24/7 
surveillance and not some possible data center. If I ever was to do such 
thing myself, I would probably go with enormous tape library for longterm 
storage and one of those hippy supercomputating storage array to fetch 
data from tapes and crunch it, and store the outcomes.

> But, if you're pushing the envelope, you can e.g. put up plenty of
> low-power SATA on trays with an ARM SoC + mesh (GBit/s Ethernet or
> custom), and hook it up via InfiniBand, and fill up several tennis
> courts with these.

You see, I could push the envelope however I wanted, but this does not 
change simple facts:

- total annual production of harddrives is, say 650 million unit

- if you want to store 10TB/year/head you need:

[16]> (defconstant +zeta+ (expt 10L0 21))
+ZETA+

[17]> (defconstant +exa+ (expt 10L0 18))
+EXA+

[18]> (defconstant +peta+ (expt 10L0 15))
+PETA+

[19]> (defconstant +tera+ (expt 10L0 12))
+TERA+

[20]> (defconstant +million+ (* 1000 1000))
+MILLION+

[21]> (defconstant +billion+ (* 1000 +million+))
+BILLION+

[22]> (/ (* 10 +tera+ 8 +billion+) +exa+)
80000.0L0

[23]> (/ (* 10 +tera+ 8 +billion+) +zeta+)
80.0L0

That's 80 ZB every year, and almost yottabyte after ten years. Or, if you 
prefer, this is 80 billion 1TB discs. Or 20 billion of 4TB discs. Or 30 
years of worldwide _official_ harddrive production. Every year. Assuming 
you will want 4TB discs, which will have 4 or 5 platters and so will be 
more prone to mechanical failure than 1-platter/1TB ones. AFAIK.

And if you wanted to go with 1TB/1-platters, this is 123 years of 
harddrive production, which you will have to make and buy every year.

In other words, if you want total surveyance:

- one 3.5'' hard disc measurements is 101.6 mm x 25.4 mm x 146 mm

- how much volume is needed to store 80 billion of 1TB discs?

[28]> (defconstant +millimetre+ 1L-3)

[32]> (let ((hddv (* 1016L-1 +millimetre+ 254L-1 +millimetre+ 146L0 
+millimetre+)) (units1TB (* 10L0 8 +billion+)) ) (* units1TB hddv))

3.0141875199999999995L7 ;; it is this many m^3, 3*10^7 or...

[38]> (let ((hddv (* 1016L-1 +millimetre+ 254L-1 +millimetre+ 146L0 
+millimetre+)) (units1TB (* 10L0 8 +billion+)) ) (exp (/ (log (* units1TB 
hddv)) 3)))

311.21230183789613968L0 ;; it means a cube having 311m size or...

[39]> (/ 311.212 24)

12.967167

It means you need to fill a cube that is 13 times longer side of tennis 
court in every direction. That's a huge bit more than just a few courts.

For perspective, it means you need to build equivalent of

[41]> (/ 3L7 2.5L6)

12.0L0

So, twelve Great Piramids of Giza, filled with discs or similar data 
storage form factor. Every year. Plus some cabling and power, and some 
staff to keep it running.

Or, if you wanted to go flat, you would need

[44]> (* 311.212 (expt 0.311212 2))

30.14179

So, 30km^2 of land, 1/3 of Manhattan Island every year.

In my previous "calculatron email" I have used units based on power of 2 
instead of 10. The difference is that, for example, 1 zebibyte = 1 ZiB = 
2^70 =~ 1.2*10^21 = 1.2 ZB (zettabyte). This changes numbers a bit but I 
don't think there is any significant difference. In this email, I decided 
to go with SI units and name 2-based units explicitly, if I need this.

> > be or it will be not. I believe only in shop inventory and numbers. There 
> > is exabyte tape library being sold by Oracle and there is about 20Tbps in 
> 
> I wouldn't bother with tape libraries with bulk data. Arguably, I wouldn't
> bother with tape libraries at all but push to remote rotating spindles.

The problem with this is spindles rotate and eat power even when one does 
not need them. Unless you implement aggressive power management, in which 
case they will still eat more than resting tapes.

> > transatlantic cables combined, with plans to extend this by maybe 300% 
> 
> Nobody is going to fill up that space overnight. If you want to fill up
> things quickly, you wouldn't bother with fiber but synchronize shipping
> container data centers locally and ship them. You can fit about 24 k 3.5"
> disks inside -- 10 PBytes/day is a lot of bandwidth, though the latency
> would suck.

Yes. But most of the time you would just store this data and never have 
any need to go back to it. In such case, bad latency is ok, I think.

> > *which are plans*:
> > 
> > http://en.wikipedia.org/wiki/Transatlantic_communications_cable
> > 
> > http://en.wikipedia.org/wiki/Exabyte
> > 
> > One exabyte library, which has to be maintained (as you said: energy, 
> > housing, but also active checking for errors in data made by bit rot). It 
> 
> zfs scrubs fine weekly with consumer drives or monthly with SAS.

Maybe it does, but does it help in case of mechanical failure? With 80 
billion units of anything, mechanical fail is not a possibility but a 
certainty.

> > will at best store surv data for 100 thousand heads (based on Anders' 
> > estimate of 10TB/year/head which is equivalent to 2Mbps a-v stream, if I 
> > am right). Two such libraries if you want to have any significant error 
> > protection.
> 
> You'd do well enough with raidz2 or raidz3 over mirrored pools 
> at disk tray level. Remember these are bulk data. The important hot 
> spots are all multiply redundant.

The idea is, one needs multiple the volume of storage to be safer (but 
never safe). We seem to agree here.

> > Data transmission from sensors in the field to any kind of storage you 
> 
> The most abundant sources of data would be from layer 7 interception in the
> telco data centers. This is collected near the edge of the network, so easy to
> deal with. 

In the case of tot-sur scenario, one is transmitting 2Mbps from every 
sensor. One spies on physical human bodies, not on their 
tele-communications (which I think is being spied nowadays just fine with 
existing means).

> > want, because you do want to store this data somewhere? It is easy to 
> > connect cities with hi-speed net, it is easy to create metropolitan 
> > network, but it is not easy to deliver hi speed to every place on the map:
> > 
> > http://en.wikipedia.org/wiki/Last_mile
> > 
> > The last mile problem remains a problem, no matter if you want to deliver 
> > data down to enduser or up from enduser to central hub. There is going to 
> > be central hub, either global one or many locals. The problem remains.
> 
> You would be running the data mining where the data is, and pool the 
> results where the analysts are. A smart approach would be to cluster 
> data spatially and temporally, so that you could limit the query scope 
> to particular hardware region. 

This spatial-temporal clustering is very nice idea. However, I am afraid 
you would end up with multitude of highly sophisticated nodes scattered on 
the face of Earth. And you would have to go around repairing failing 
equipment all the time, thus ending one problem gives rise to another. And 
it is possible you would also betray location of nodes to local mobs.

But, actually, the whole point of mine is to illustrate something: I don't 
know if tot-sur is feasible (I would have to know classified tech for 
this) but trying to do it with current tech would have been very 
frustrating, IMHO. And I don't think we are in the area of "double every 
year", not anymore. So I guesstimate, ten years from now tech will not 
improve by factor of 1000, but I can believe it may improve by factor of 
100. However, if I had to bet my own money, I would bet it would be better 
than factor of 50 but not much over that. And even this may prove to be 
overly optimistic, you might want to see this:

[57]> (let ((years 10) (inum 20)) (dotimes (i inum) (let ((x (+ 1 (/ (+ i 
1.0) inum)))) (format t "~A years by ~A growth: ~A~%" years x (expt x 
years)))))

10 years by 1.05 growth: 1.6288936
10 years by 1.1 growth: 2.593743
10 years by 1.15 growth: 4.045558
10 years by 1.2 growth: 6.1917367
10 years by 1.25 growth: 9.313226
10 years by 1.3 growth: 13.785842
10 years by 1.35 growth: 20.106564
10 years by 1.4 growth: 28.925459
10 years by 1.45 growth: 41.08471
10 years by 1.5 growth: 57.66504
10 years by 1.55 growth: 80.04182
10 years by 1.6 growth: 109.9512
10 years by 1.65 growth: 149.56822
10 years by 1.7 growth: 201.59943
10 years by 1.75 growth: 269.3894
10 years by 1.8 growth: 357.0466
10 years by 1.85 growth: 469.58838
10 years by 1.9 growth: 613.1065
10 years by 1.95 growth: 794.9618
10 years by 2.0 growth: 1024.0

Sorry for the clumsy formatting of the above.

However, 50 or even 100-times improvement merely brings us on the edge of 
"technically feasible", but nowwhere near "effortlessly feasible", so even 
after 10 years, doing this will mean great expenditure and probably 
necessity to build whole parallel industries dedicated only to this one 
thing. And there are quite a few things that need to improve by a factor 
of 100+. But I can smell that improvement is actually slowing down. Of 
course there are many promising breakthroughs in the labs, and it will 
take years to see in the shop what we read about today.

Translating this to the speak of hard drives, I think I will be able to 
buy 10TB/1-platter disc, maybe even 20TB/1-platter, in 2022, but I don't 
count very much on 1PB or even 60TB discs (even thou they are "promised" 
by some web news). Sorry. But I would like to be wrong, you can be sure of 
it.

If we extend window to twenty years, this is probably very different 
story. Twenty years from now, I might be delighted by a prospect of being 
constantly watched over. Who knows. In twenty years, tech might improve 
even by factor of 150 times, speed of improvement still slowing down, 
however. I didn't calculate so I may be off by a digit, but I don't think 
I am wrong on magnitude.

If you have better numbers, I will be happy to see them. Really.

Regards,
Tomasz Rola

--
** A C programmer asked whether computer had Buddha's nature.      **
** As the answer, master did "rm -rif" on the programmer's home    **
** directory. And then the C programmer became enlightened...      **
**                                                                 **
** Tomasz Rola          mailto:tomasz_rola at bigfoot.com             **