[ExI] Saving the data
Anders Sandberg
anders at aleph.se
Mon Nov 30 21:00:52 UTC 2009
The real scandal in "climategate" is of course that people throw away
data! Every bit is sacred! To delete anything is to serve entropy!
Only half joking. Clearly any research project should try to ensure that
its data remains accessible for as long as its papers are used in
science - and given that we occasionally refer back to Almagest and
sumerian clay tablets to ask questions unimaginable to their
originators, that is a *long* time. But long term data storage is also a
terribly tricky problem. Formats change, media decay. And after the
initial period interest in the data wanes, making people less motivated
to save it.
I think there are two kinds of datasets, having different problems. One
is the "big" dataset that taxes available resources. They are big enough
that people recognize their importance, but they are hard to move and
copy. These run the risk of being deleted to save space (like the BBC
did with its early tv programs) and are often stored in just one place -
plenty of risk of being destroyed by the occasional war, flood or fire.
The other kind is the "small" dataset that does not tax resources that
much. Their problem is usually that they are badly documented and once
they become uninteresting they can easily run into format or media
decay. How many projects have not been permanently deleted when the
research group repurposes one of the old PCs as a printer server or a
part of the Beowulf cluster?
Given the rapid growth of storage capacity (just look at
http://www.mkomo.com/cost-per-gigabyte !) it seems that we could
probably save *all* datasets in the world under a certain fraction of
typical hard drive size. Imagine making it a publication requirement to
place the dataset and software to make it (in an ideal world with
metadata explaining how to run it) in a distributed server if they are
smaller than X gigabytes. There could be an escrow system limiting
access, perhaps making data freely available after 20 years and before
then available by request to the authors, journal or sufficient number
of funding bodies. Over time X would increase (doubling every 14 months?)
This scheme would of course require funding, but also a very stable
long-term organisation that can move to new media. Perhaps allowing
forking would be one way (some cryptographic trickery here for the
escrow) so that even amateurs might be able to run their own version
with all data smaller than Y gigabytes. Sounds very much like something
the Long Now Foundation might have been considering for their 10,000
year library.
--
Anders Sandberg,
Future of Humanity Institute
Philosophy Faculty of Oxford University
More information about the extropy-chat
mailing list