[Paleopsych] BBC: Visionaries outline web's future

Sun Oct 10 22:41:11 UTC 2004

Visionaries outline web's future
http://newsvote.bbc.co.uk/mpapps/pagetools/print/news.bbc.co.uk/1/hi/technology/3725884.stm
2004.10.8

[$260 million is just $10 per book! A friend of Sarah's took us to a floor 
at the National Library of Medicine several years ago where human workers 
opened books at a 90 degree angle and placed two facing pages on a wedge. 
With suitable mirrors the images were scanned and turned into bitmaps or 
whatever, not files transformed by optical character recognition. This 
was so long ago that I wondered about the bandwidth costs of transmitting 
such files over the net. Today, bandwidth has greatly increased, so this 
should not be a problem, though searching for text will be. Furthermore, 
humans have been replaced by robots that, allegedly, can pull stuck pages 
apart better.

[Also, $60,000 to store a terrabyte of scanned images sounds way too much. 
You can buy hard disks right now for a dollar a megabyte. Is this a case 
of governments driving up costs by a factor of 60, or is there something I 
don't know about.

[Note that the article speaks of scanned images only. I have no idea what 
the cost function of converting images to text or html would be, the 
function being from quality to cost, that is.

[Of course, copyrights would have to be bought up or just seized by the 
feds. I think copyrights should last only 20 years, on the grounds that 
there is little difference between a 20-year annuity, the 75-year one 
that exists now, and a perpetual one, and that the benefits of putting 
works into the public domain after 20 years exceed this slight difference.

[Does anyone have an estimate of the value earned every year from the 
monopoly granted by copyrighting in this country? What percent of books 
are less than 20 years old?

[Yes, yes, things get very complicated very quickly. I basically just want 
to pass along a news item. You should realize that the Library of Congress 
is unreasonable conservative when it comes to reproducing things: the 
sound recordings that it place on its site avoided everything that still 
*might* be under State copyright.]

    Universal access to all human knowledge could be had for around $260m,
    a conference about the web's future has been told.

    The idea of access for all was put forward by visionary Brewster
    Kahle, who suggested starting by digitally scanning all 26 million
    books in the US Library of Congress.

    His idea was just one of many presented at the Web 2.0 conference in
    San Francisco that aims to give a glimpse of what the net will become.

    Experts at the event said the next generation of the web will come out
    of the creative and programming communities starting to tinker with
    the vast pool of data the net has become.

    Small start

    Despite the hype surrounding the dotcom era, many believe that the
    vast potential of the net to change society and business remains
    largely untapped.

    The last few years have been more about making a working
    infrastructure and making it useable with browsers, search engines,
    blogs and a variety of other programming tools.

    The future will build on this basic infrastructure in ways that "grow
    in the telling", said Tim O'Reilly, co-organiser of the Web 2.0
    conference.

    Web 2.0 will also build on the groups that are springing up around
    well-known net companies such as Google, Amazon, eBay and many others.

    Speaking about what this future will be like, Jeff Bezos, boss of
    e-commerce firm Amazon, said it will be about making the web useable
    for computers rather than people.

    This will revolve around tools and programs that re-work the
    information collected by firms like Amazon that will help create new
    services and businesses.

    One such is MusicPlasma which mines Amazon data to produce a visual
    search engine to let people find other music that resembles the stuff
    they already listen to.

    Another is the Scoutpal service that lets people scan book bar codes
    to find out what price of the title on Amazon.

    Amazon already has 65,000 developers who are working on ways to
    plunder information on its site for their own ends. The payback for
    Amazon is the selling of more stuff through its site.

    Big ideas

    Another glimpse of how the web is changing was given with the
    unveiling of new search engine Snap by net veteran Bill Gross.

    Snap lets people find web pages related to a keyword query but also
    produces lots of extra information.

    For instance, a search for digital cameras produces a table detailing
    popular models that others have looked for.

    Mr Gross said Snap was a precursor of what the net will become as it
    tries to encourage interaction and builds on the data trails that
    earlier visitors leave behind.

    A well as talking about what the web will become, the conference also
    gave a platform to people with big ideas for how the potential of the
    net can be harnessed.

    Brewster Kahle's idea is to scan as many books as possible and put
    them online so everyone has access to that huge amount of knowledge.

    In his speech, Mr Kahle pointed out that most books are out of print
    most of the time and only a tiny proportion are available on bookshop
    shelves.

    Using a robotic scanner, Mr Kahle said the job of scanning the 26
    million volumes in the US Library of Congress, the world's biggest
    library, would cost only $260m (£146m).

    He estimated that the scanned images would take up about a terabyte of
    space and cost about $60,000 (£33,000) to store. Instead of needing a
    huge building to hold them, the entire library could fit on a single
    shelf.

    The Web 2.0 conference was held in San Francisco from 5-7 October.