[ExI] The NSA's new data center

Sat Mar 31 19:45:07 UTC 2012

On Sat, 24 Mar 2012, Anders Sandberg wrote:

> I did a little calculation: at what point can governments spy 24/7 on their
> citizens and store all the data?
> 
> I used the World Bank World Development Indicators and IMF predictors for
> future GDP growth and the United nations median population forecasts, the fit
> 10.^(-.2502*(t-1980)+6.304) for the cost (in dollars)per gigabyte (found on
> various pages about Kryder's law) and the assumption that 24/7 video
> surveillance would require 10 TB per person per year.
> 
> Now, if we assume the total budget is 0.1% of the GDP and the storage is just
> 10% of that (the rest is overhead, power, cooling, facilities etc), then the
> conclusion is that doing this becomes feasible around 2020. Bermuda,
> Luxenbourg and Norway can do it in 2018, by 2019 most of Western Europe plus
> the US and Japan can do it. China gets there in 2022. The last countries to
> reach this level are Eritrea and Liberia in 2028, and finally Zimbabwe in
> 2031. By 2025 the US and China will be able to monitor all of humanity if they
> want to/are allowed.
> 
> So at least data storage is not going to be any problem. It would be very
> interesting to get some estimates of the change in cost of surveillance
> cameras and micro-drones, since presumably they are the ones that are actually
> going to be the major hardware costs. Offset a bit because we are helpfully
> adding surveillance capabilities to all our must-have smartphones and smart
> cars. I suspect the hardware part will delay introduction a bit in countries
> that want it, but that just mean there will be hardware overhang once they get
> their smart dust, locators or gnatbots.

Anders, I admire your analysis and even more, ability to ask questions 
like this (damn, why I didn't... ;-) ). I would like to add few points to 
your answer, of course from my limited "AFAIK" point of view.

First, while cost of mere storage (a.k.a. price per gigabyte) is dropping, 
there is more to having extralarge database than just stacking haddrives 
on each other. I think maintainance cost is going to kill any such project 
very quickly. The biggest publicly aknowledged databases nowadays range 
from petabyte to somewhere around 10PB (hard to tell where exactly this 
"around" is, news are somewhat dated) [1] [2]. Anyway, while we could 
extrapolate, based on facts like "in 1992 Teradata creates first 1TB 
system" and in 1997 they make a 24TB one [3], so this would give

[19]> (exp (/ (log 24) 5))

1.88

So, almost doubling a year, which should give about 26PB in 2008:

[24]> (expt (exp (/ (log 24) 5)) 16)

26102.13

However, from the same source ([3]), they delivered "only" a 1PB system in 
that year.

So, the brutal "doubling every year" extrapolation doesn't work in real 
world and what was close to 2 in the past is close to 1.5 nowadays. 
I guess reasons for this vary from technical issues to maintainance costs. 
There are also issues related to actually processing your data. You can 
throw tapes or disks into the basement, no problem, however pulling 
anything useful from this pile of crap is totally different thing. The 
current technological limit seems to be somewhere between 100PB and 500PB
[4] [5] [6], even though there is an exabyte tape library on sale [7] [8]. 
And this has nothing to do with various estimates about exabytes of 
content per month passing through the net.

Basically, from the point of maintainance, at current tech level it 
requires a building built every year to house this much data, and it 
requires actively checking your data for errors, making backups, and so 
on. Which reduces real capacity by about half, optimistically-wise.

With the best tech available now (but not really deployed into the field 
yet), you could provide total-sur for:

[27]> (/ (* 250 +peta+) (* 10 +tera+))

25600 people.

Assuming it will double every year (I think it will not), this gives ca. 
25 million heads by 2022. If you could throw 100 times as much money into 
this project, about 1/3 humanity. However, every year into the project, 
the number of people needed to maintain data storage will grow. There will 
be costs with migrating from one storage medium/technology to another 
about every 10-15 years. And some other costs I am not aware of, because I 
am not very deep into the subject.

The second problem I see is with data transfer. If current telecom 
infrastructure says anything, the limit today is less than 10Tb/cable in 
intercontinental links  [9] [10]. If we assume "central hub" is to be 
located in USA, then transferring live coverage of all European pop with 
300kbps stream will take

[39]> (floor (* 600 +million+ 300 +kilo+) (* 10 +tera+))

16 ;
8398139555840

So, rounding up, 17 best submarine telecom cables thinkable today, which 
however are not yet deployed - AFAIK the best ones are somewhere around 
5Tbps and they are still in construction. Even worse, pushing all live 
streams from all humanity would require 

[42]> (floor (* 8 +billion+ 300 +kilo+) (* 10 +tera+))

223 ;
5689070059520

About 225 10Tbps cables, all coming down into one data storage. I wouldn't 
bet any money this is possible now. Maybe 10 years from now this will look 
better, but I won't bet either.

In a centralised scenario, it *might* be possible to "serve" about million 
heads per continent. In a decentralised one, maybe 2-10 times as much. Ten 
years from now, multiply by... 1.5^10=60 (optimistic case) or 1.2^10=6 
(more realistic one).

Assuming I didn't blow up anywhere and of course I am using all official 
data from public sources.

When it comes to algorithmic advantage of "unofficial" guys as opposed to 
"public" ones, hard to tell but I wouldn't count on miracles. Perhaps some 
problem whose best published algorithm has O(n^2) complexity has been 
"solved" with unpublished algo of O((log2 n)^2) complexity, but there are 
some limits as of how good the good can be made. With amounts of data we 
talk about, I guess this doesn't help so much. Or, you can reduce analysis 
of one second of material by 1000 times, yet with so many seconds to 
analyse this is not going to be all that helpful.

Soo... I am sure total-sur might be possible in the future. But at the 
same time I think the window of opportunity is probably a bit wider than 
10 years.

I mean, we can talk about "intelligent dust" or "intelligent insects" etc 
etc, but AFAIK they are not fielded, and besides, even dust will have to 
report back its recorded material, so we have to make a hub for data 
storage and analysis, which is going to be hard.

Besides, if you care, I guess using good vacuum cleaner and mosquitieras 
can make life of dust and insects much harder. So many research grants 
only to end in a trasher. Or be watered down the drain inside huge shower 
cabin. Woo hoo, big deal.

Now, few words to supporters of "down with privacy" side (not you, Anders, 
but I am in a bit of hurry now, so I can as well put it here).

I don't think life in the past was anywhere "normal". Dying from TB or 
gangrene is not "normal". Being eaten alive by predators is not "normal". 
I am 20/21st century man, not some caveman. By extension, having no 
privacy is not "normal" either.

Also, I wonder, how many Galileos and Copernics would we have in tot-sur 
society ruled by Inquisition? Since we are at it, Inquisition might have 
started from religious based reasons, but soon some folks discovered it 
may suite their earthly needs very well, too. Hence it was so cool to 
denounce neighbors or rival merchant, and it was so cool to torture naked 
women, which AFAIK wasn't required by any religion at the time. As soon as 
you create a powerfull tool like tot-sur system, expect it to be taken 
over by all kind of psychopatic element, in about 10-20 years time, or 
maybe even 0 years really. Good luck living under their rule for the next 
10 thousand years.

BTW, if you think nowadays is any better, because we have civilised, well, 
dream on. I wonder, for example, how many planes Wright bros would have 
built with mob coming to their workshop every day for a bit of wooing and 
joking.

Regards,
Tomasz Rola

[1] http://www.focus.com/fyi/10-largest-databases-in-the-world/

[2] http://gadgetopia.com/post/1730

[3] http://en.wikipedia.org/wiki/Teradata

[4] http://en.wikipedia.org/wiki/Petabyte

[5] 
http://www.geek.com/articles/chips/blue-waters-petaflop-supercomputer-installation-begins-20120130/

[6] http://www.technologyreview.com/computing/38440/

[7] http://en.wikipedia.org/wiki/Exabyte

[8] http://www.oracle.com/us/corporate/press/302409

[9] http://en.wikipedia.org/wiki/TAT-14

[10] http://en.wikipedia.org/wiki/Transatlantic_communications_cable

--
** A C programmer asked whether computer had Buddha's nature.      **
** As the answer, master did "rm -rif" on the programmer's home    **
** directory. And then the C programmer became enlightened...      **
**                                                                 **
** Tomasz Rola          mailto:tomasz_rola at bigfoot.com             **