[extropy-chat] how does google find out everything?

BillK pharos at gmail.com
Fri Aug 18 20:06:31 UTC 2006


On 8/18/06, spike wrote:
>
> Internet gurus, how does Google get its info?  If anyone puts up any
> website, Google knows somehow, right?  It doesn't take any action on the
> part of the author.  Google knows of every post on ExI as well as any other
> chat group too?  It knows when you are sleeping, it knows when you're awake.
>
> How does it do that?


Google doesn't know everything. Nowhere near. It just knows a lot. It
is estimated that Google only indexes about 25% of the total web
pages.
The web is just too big (and growing fast).  You are very likely to be
able to browse to web pages within web sites that don't appear in any
search index.

Russell's description of search engine spiders is correct. But much simplified.
How technical do you want to get?   :)

Different search engines have different algorithms and priorities.
Some only index home pages of web sites, some dig deeper. Sites
already in the index get rechecked quicker than new sites are searched
for. And some sites, like news sites, get rechecked very frequently.

Metasearch engines that combine many search engine results are useful
for getting an alternative view. I use clusty and ixquick.

Specialised search engines are also useful. e.g. PubMed for medical data only.

By implication search engines can only index pages that have links
that you can browse to. But there are databases and catalogues that
don't have these links. They are called the Invisible Web and you need
special techniques to get at them.

<http://websearch.about.com/library/tableofcontents/blsearchenginetableofcontents.htm>
gives you 100 search engines with descriptions of each one.

Have fun!

BillK



More information about the extropy-chat mailing list