[Paleopsych] Gary North: Google's Free Desktop Search Engine: E-Mails Only

Premise Checker checker at panix.com
Tue Oct 19 16:45:54 UTC 2004


Gary North: Google's Free Desktop Search Engine: E-Mails Only
Gary North's REALITY CHECK
Issue 388, October 19, 2004

[For myself, I definitely want to be able to index WordPerfect files, esp. 
good old WP 5.1 for DOS, and files without extensions, as so many WP files 
are and so many I download from my UNIX shell account. Using Window's own 
search engine, I can find any string in any file, but opening them is a 
horrible chore. Actually, WP 5.1 for DOS allows you to rapidly peek at 
files one after another by pressing *one* key, but only in alphabetical 
order by file name within a single directory. Power Desk lets you do this 
but converts files first, so you are not looking at the raw ASCII 
characters, and is thus a bit slow.

[And I'd love to be able to find files with words near one another, as you 
can in Nexis, and then specific parts of files. Thus, I have a long WP 
file of my cassette tapes and would like to find all of them what have 
Beethoven conducted by Mengelberg.]

      You are hereby authorized to send this report to
      anyone, or post it on-line.

         GOOGLE'S FREE DESKTOP SEARCH ENGINE: E-MAILS ONLY

      I have good news and bad news.  The good news is that
Google's new Desktop Search program works great on a hard disk
full of old e-mails.  The bad news is that it doesn't work on
downloaded Web pages after day one.

      You can download the program here:

                     http://desktop.google.com

      To download it takes only a couple of minutes.  It offers
you a choice: let the program communicate with Google to identify
problems -- this is a beta version -- or override this feature.
I overrode it.  I am writing this report instead.

      As you know, I think the best way to make more money is to
increase your productivity.  That's why, from time to time, I
report on freeware that can improve the way we do our work.  This
is one of those reports.

      After 24 years of dreaming, one of my data storage and data
retrieval dreams has come true.  Well, not quite.  It came true
for one day.  Then I woke up.  Reality intruded.

      I assume that you have the same dream: locating that lost
file, e-mail, or article.


HERE IS MY PROBLEM

      I am a writer.  I write all sorts of stuff -- e-mails,
articles, and books.  Like everyone else, I forget where I have
filed all this stuff on my hard drive.  I lose track of where
certain items are.

      I also do a lot of research.  I download links in my
"Favorites" section, but this feature has severe limitations.
The main one is that Web links go bad all the time.  When I click
on a link, I often get this page instead: "This page cannot be
displayed."  Bad news.  A second limitation is that I must be on-
line for a Web link to work.  The Web document is not on my hard
drive.  A third limitation is that I can search only for keywords
in the name I assigned to the link.  Because the page itself is
not on my hard drive, I cannot search for words in the original
article.

      Google has begun to solve my problem.  The company has just
released a search program for desktop computers.  It looks and
works much like Google's on-line search engine does.  It works
instantaneously.  It is in beta-testing stage.  It still has
glitches.  The company is going to enlist a "team" of several
million beta-testers who will help identify problems free of
charge.  I am one of them.

      I downloaded the program.  This took under two minutes.  I
then had it index my hard drive.  I have about six gigabytes of
files.  This process took under ten minutes.

      If you have a huge collection of wordy HTML pages on your
hard disk, it may take overnight for the program to index your
hard drive the first time.  Anyway, that's what Google says.

      Here has been my long-term problem.  In my office, I have
eleven 4-drawer filing cabinets stuffed with clippings, which are
filed under hundreds of categories.  I have no electronic filing
system for these clippings.  So, I forget about them.  Also, when
I do remember an article, I may forget which article was filed
under which category: the keyword problem.  Ideally, it ought to
be filed under half a dozen keywords.  It amazes me how many I
can still find, even though I stopped filing in 1996 when I went
on the Web full-time.

      I don't trust my brain to do this work indefinitely, and
besides, I would prefer to create a data base of clippings that
others can use after I'm gone.  Filing cabinets full of articles
filed according to my classification system are not easily used
by third parties.  In any case, I rarely use those files these
days because of the Web and Google.  I don't read physical
newspapers any more.  I read on-line newspapers instead.  I don't
need to print them out.

      With Google's search tool, I will now assemble a very large
collection of digital documents.  I will be able to locate them.
If I forget one, I may still be able to retrieve it through the
use of keywords.


HOW WELL DOES IT WORK?

      On the first day, it worked fairly well for a beta product.
I would give it a B-minus.  On the second day, it went to a D.
Something in the system died.  It worked only on e-mails.  Google
has a big problem.  But the e-mail feature is so good that I
recommend it as-is.

      Click on the program's desktop icon.  A search page pops up
that looks like Google's regular search page.  It offers a
"Search Desktop" button and a "Search the Web" button, which you
can use if you're on-line.

      It searches these file types, any of which you may choose or
reject: Outlook mail, Outlook Express mail, AOL IM, Word, Excel,
PowerPoint, text, Web history, and Web pages (HTML).  It does not
search PDF files.  Too bad.

      First, I wanted to see if I could retrieve a specific e-
mail.  I decided to search for the word "gatekeepers."  I have
used this word to describe those people and institutions that
filter information so that the general public cannot gain easy
access to it.  I have argued that the Internet has created a
society in which gatekeepers can no longer perform this function
effectively.  This fact is re-shaping society.  (For evidence,
search the Web for "Monica Lewinsky," "Drudge," "Newsweek,"
"spiked," and "impeached.")

      I typed in "gatekeepers."  As soon as I clicked the "Search
Desktop" button -- in the twinkling of an eye, to use St. Paul's
language -- there were half a dozen e-mail links on my screen.
Every one of them had "gatekeepers" in it.  I could see a brief
extract of each e-mail on-screen.  I clicked the first link.  Up
popped the complete e-mail, nicely formatted.

      This takes care of what has been a major retrieval problem:
e-mail clutter.

      Note: I use Outlook Express.  The beta version of Google's
Desktop Search works only on Microsoft Outlook and Outlook
Express.  I hope the programmers add other formats later on.

      Second, I have always wanted to fill my disk with Web pages:
a digital filing cabinet with 500 drawers.  I do lots of research
on the Web.  I want to be able to do the following:

      (1) Download a Web page.
      (2) Enter as many keywords as I can think of.
      (3) Search months later for any or all of these keywords.
      (4) Have the search program pull up the link fast.
      (5) Avoid pulling up links to 50 unrelated Web pages.

      I went to Lew Rockwell's site and downloaded an old article
of mine, "Why the Job Market Is Slanted in Favor of College
Graduates."  This article discusses some of the myths of college
education and ways to get around the corporate career barriers
that are placed against non-graduates.

      Second, I saved this page to my hard disk -- not
"Favorites," but "Save As."  To save it, I had to type in a title
in the "File name" box.  Instead of typing in a name, I typed in
keywords that I think I may possibly recall if I ever go looking
for this article or related articles.  I typed in these words:

      discount college degree graduates money salary business
      hire apprenticeship boredom.

I clicked the Save button.  In an instant, the article was saved
to my hard disk.

      This long title widens the spaces in between the rows of
article titles in my filing system.  This is a small price to
pay.

      I then disconnected from the Web.  I wanted to test the
program's disk-searching capabilities.

      Again, I clicked the Google icon.  Up came the search page.
I typed in two words: "discount college" (without quotation
marks.  I then clicked the "Search Desktop" button.  Immediately,
I got a list of files.  The first two were obviously useless:
gif files: component parts of the complete file.  The third one
was the right one: "Why the Job Market Is Slanted in Favor of
College Graduates."  (http://snipurl.com/8tqi)

      I clicked it.  Bad news.  I got "This page cannot be
displayed."  For some reason, pages from LewRockwell.com do not
download properly, as I was to discover in subsequent tests.

      Then I noticed the word "cached" at the end of the link.  I
clicked it.  Up popped the original article, with the words
"college" and "discount" highlighted in yellow.  Because of the
highlighting, this cached format is actually more valuable to me
than the original page would have been.  I can easily find the
keywords I'm looking for.

      Warning: If your keyword is not in the original
      article, there will be no highlighting.

      The Google program searches both for keywords in the "File
name" and words in the original article.  This is good.

      In downloading other Web pages, I did not have the "problem"
of the message, "This page cannot be displayed."  But all
downloaded files offer "cached," and these cached pages have
highlighted all of my keywords that are in the article.  So, I
intend to use "cached" as my original choice.  It makes rapid
skimming so much more efficient.

      This takes care of the second-biggest storage/retrieval
problem I have had since 1996: how to save a Web page to my hard
disk and find it later.  The use of a long file name solves my
old clippings-based problem: the need for multiple categories for
the same clipping.

      Or should I say, it took care of my problem for one day.
Then it died.  When I came back the next day to retrieve the same
article on college, the program failed to locate it.  It is still
on my hard disk.  I checked.  I searched for discount college
degree slanted.  I was told on-screen that this did not match any
items.  So, I extracted a phrase and put quotation marks around
it: "dirty little secrets."  Nothing.

      Google Desktop Search had completely lost track of the
article.

      This happened twice.  I quit trying after that.  I came
across an article on the French philosopher, Jacques Derrida.
Derrida taught that words and reality are not really linked.  He
was famous as a deconstructionist philosopher.  The article is a
short, brilliant satire on Derrida, who is dead.  Or is he?  We
can't be sure if we follow Derrida's philosophy.  You can read it
here: http://snipurl.com/9sr0.

      Deconstructionism is taught at the best and most expensive
universities that charge parents $140,000.  It is rarely taught
at junior colleges or lower-tuition universities.  This is
another reason why I recommend discount colleges rather than Ivy
league universities.  For my instant-reply free report on this,
send an e-mail to

                    discount-colleges at kbot.com

      In order to test my keyword system, I saved the Web page to
my disk and inserted these words in the "File name" box:
"goofball French philosopher."

      If I ever download a page on Jean Paul Sartre, I will have
to use a different file name, so as not to overwrite the article
on Derrida.  I will add "commie" to "goofball French
philosopher."

      Again, I disconnected from the Web.  Then I typed in
"goofball."  Up came the article.  This time when I clicked on
the link, there was no announcement: "This page cannot be
displayed."  I did not have to click "cached."

      Nevertheless, I will always click "cached," because I like
the word-highlighting feature.

      The Google program also pulled up a bunch of irrelevant
links: "background.gif," "article_submenu_r1_c1_gif," etc.  This
is a beta version.  I trust that the programmers will fix this
glitch.  If not, the venture will fail.  People will not use the
program.

      Will I remember "goofball French Philosopher" as an article
on Derrida?  Probably.  But if I don't, I can still search for
"Derrida."  Google will search the text on the Web page, not just
my name/keyword entries.

      Searching will become a problem only if I have lots of
entries on Derrida, which I won't.  A little Derrida goes a long
way.

      But, like my article on college, on day two the article
disappeared.  I searched for Derrida.  No article.

      So, I give the program a D: an A for e-mail retrieval, but
an F for Web page retrieval.

      Back to the digital drawing board, guys!

      Let's assume that they get this fixed.  I think they will.
What happens if I want to find an article labeled "Greenspan"?  I
will have a problem.  Now that I have Google Desktop Search
program, I will file lots and lots of Web pages about or by
Greenspan.  I may even have to buy a 200-gigabyte hard disk.  How
will I locate the Greenspan article I am looking for?  "Goofball
central banker" won't be sufficiently narrow.  Besides, I can
only use it once.

      I will use the "File name" box to enter key words that I
think will help me retrieve a particular Greenspan article.  That
will take some creativity on my part, but every detailed filing
system does.  I won't type in Greenspan.  That word will be in
the article's text.  I'll file the page in a "Greenspan" folder.
If the article is about his obtuse language, I will add
"esperanto."  If it's on his denial of a bubble, I will use
"bubble."  If it's an admission of a previous bubble that he
failed to warn against or even denied was happening at the time,
I can use "retroactive bubble."  And so on.

GLITCHES: .GIF FILES

      The program retrieves .gif files along with the original Web
page.  This is a major flaw.  The program should retrieve only
the Web page, not clutter up the screen with useless .gif files
that make up the Web page.

      Another glitch is also quite serious.  I downloaded a Web
page with lots of graphic-heavy ads in it.  All I wanted was the
text of a particular columnist.  It took a long time to save this
page to my disk.  Then, when I used Google Desktop Search, all I
got was a .gif file -- a small image.  No text.

      So, I removed the file.  Blip.

      I then went to a "print" page of the columnist's article and
downloaded it.  I named it similarly, but not identically.

      Google Desktop Search now cannot find the new file.  It
keeps retrieving the original .gif file.  It says: "The following
file cannot be found . . . This may happen if you renamed or
deleted this file." But I deleted a different file/name.

      Worse: I now cannot delete the .gif file.  I cannot even
find it on my hard drive in order to delete it.  Yet Google keeps
retrieving it, while ignoring the replacement.

      The program apparently cannot differentiate between the
quick and the dead.  They had better fix this problem.

      So, it's a beta program.  Don't expect it to solve your data
retrieval problems yet.  But Google is serious about developing a
useful piece of software.  This is surely useful for e-mails now.
It will get better.


WHAT NEXT?

      You can use quotation marks to limit your search to a
phrase.  This is helpful, but only if you recall the exact
phrase.  The program should someday imitate Google's "Advanced
Search" option.  It will allow a search for only specific words,
though not a single phrase.  This will let us retrieve only those
documents that contain all of these words.

      It should search PDF files someday.

      Google should create an on-line manual of tips on how to use
this program efficiently.  To do this, the company should hire a
skilled instruction manual writer to work with a skilled user of
the program.  (These are never the same people.)


PRIVACY

      A competing company, whose product, Copernic, I used to use,
says that Google's product could become a privacy threat.  The
company has said that the threat is potential.  Users can opt out
of sharing information, which I did.

      If your computer is not secured by a password, then someone
can access your e-mail, with or without Google Desktop Search.
Your privacy problem is your lack of computer security.  Don't
confuse the issue.

      Google wants to protect its image as non-coercive.  I have
few fears that the company is going to use my files against me,
especially when I opted out of data sharing.  In any case, my
files are sufficiently boring that I'm not worried about MP3-
tracking by the RIAA, which was one example that the critic
provided.  I don't have any MP3 files.  If I ever do, they will
be lectures and sermons, which I am happy to share.


MY SECOND DREAM

      I have a second dream regarding data storage and retrieval.
I want to take notes verbally on a digital recorder that allows
me to use a high quality external microphone like the Sennheiser
835e ($99), and then upload my voice files into speech-
recognition dictation software: Naturally Speaking 7.  Then I
want to be able to retrieve these notes (Google).  I'm still
looking for the right digital recorder, although I think it will
be a Sony.

      These notes will be long, especially if I'm summarizing a
500-page book.  So, when I call up a file by means of keywords,
the file may be very long.  I want to be able to go right to the
section I need to recall.  I will therefore need keywords.

      I will dictate key words at any relevant point in a
document.  After verbally summarizing whatever I have been
reading, I will tell the program "paragraph."  It will create a
new paragraph.  Then I will dictate a string of keywords.  Then I
will say "paragraph."

      When it comes time to retrieve documents using these
keywords, Google Desktop Search highlights in yellow the search
words.  This means that I can type in the search words and then
rapidly scroll down the file, which will probably be in ASCII or
e-mail text format.  I will be able to skim rapidly.

      I can make skim reading faster by breaking up my notes into
shorter files, such as summaries of just one chapter, and filing
all of these chapters in one folder.  This way, I can skim read a
shorter file.

      The problem is, I may want the folder to summarize one book,
or I may want the folder to relate to a single research project,
which means notes taken from many sources.  Or maybe I want both
kinds of folders.

      This is not much of a problem any more.  It was when I used
physical folders.  Google Desktop Search will spot the keywords.
It pays no attention to folders.

      I will probably create one notes folder per verbally
summarized book.  If I summarize several of one author's books, I
will create a large folder with his name, and then create sub-
folders with each book's summary.  I can also download book
reviews from the Web.

      Now I'm going to let you in on a little-known fact.
Universities today subscribe to many newspapers and journals on-
line.  These are library-paid subscriptions, not Web articles.
If you go to a local university, you may be able to log on.  Not
all of them screen out visitors.  Not all require passwords.  You
can read a full-text article on-line and then send the document
to your email box.  It depends on school policy.  When you can do
this kind of research at a local university, free of charge, you
can assemble a huge data base.  Google Desktop Search lets you
manage a large data base.

      This program, when it's out of beta stage, will be a godsend
for college students.  Here is a tool to use in researching term
papers.  For graduate students, it will become indispensable.


CONCLUSION

      If you want to retrieve lost e-mails, this program will help
solve your problem.

      Someday, they will fix it, so that it will work well with
downloaded Web pages.  Then it will be a terrific tool.



More information about the paleopsych mailing list