[Paleopsych] New Scientist: Automated web-crawler harvests resume info
Premise Checker
checker at panix.com
Tue Mar 29 15:50:07 UTC 2005
Automated web-crawler harvests resume info
http://www.newscientist.com/article.ns?id=dn7181&print=true
* 18:34 21 March 2005
A new search engine focused on people can automatically identify
online information on individuals and weave it into detailed
summaries.
Just like Google and Yahoo, ZoomInfo crawls and indexes the web. But
instead of serving up the pages in response to a query, it attempts to
identify and extract specific information on people.
After entering a name into the search box, a user is presented with a
list of matching individuals. Clicking through to their resume-like
summaries, can reveal their job title, company name, past jobs and
universities attended.
The site is free to use and went live on Monday. It will be
particularly useful to head-hunters, recruiters, journalists and
networkers, says ZoomInfo's chief scientist Michel Décary, based in
Cambridge, Massachusetts, US. In future it may serve up paid
advertising as well as query responses but right now he says its
purpose is to find more subscribers for the company's premium search,
which charges recruiters $1000 a month.
"I don't know anyone that has tried to do people search in an
automated fashion," says Danny Sullivan, the UK-based editor of search
industry news website Searchenginewatch.com. Existing people-finding
search tools such as Yahoo! People Search and Intelius are indexed
manually but automating the process means that a lot more information
can be searched and presented to the users, says Décary.
Verbs and nouns
Shopping websites such as Froogle and Shopping.com already extract
prices automatically from online stores. But this is much easier than
figuring out what someone does and where they work from a mixture of
company websites, news articles and press releases, says Décary.
InfoZoom deploys algorithms that pick out verbs and proper nouns to
home in on names, he says. The algorithms also infer context to weed
out phrases that appear to be real people, such as Penny Lane and
Harry Potter. Potential new information is also compared to databases
of known names, job titles, degrees and universities.
Inferring context also enables ZoomInfo to aggregate information found
in several places that applies to the same person, and to separate out
different people who share the same name, says Décary.
Presidential confusion
Although the search engine currently boasts an index of 25 million
people, many of their summaries are incomplete and inaccurate, says
Sullivan. President George W Bush is listed as the British Prime
Minister, the governor of Florida and the Governor of Massachusetts,
as well as the president of the US, while some New Scientist employees
are listed as editors at a company called "Scientist".
"The mistakes you see point to the difficulty of the task and not the
sloppiness of the technology," explains Décary. He says that actors,
celebrities and journalists are much harder to index than CEOs and
engineers, because their names appear on many different pages and in a
variety of different contexts.
Privacy experts have criticised the technology for aggregating
information about people without their consent. But Décary says that
the information collected only relates to employment and education and
is freely available online to a determined searcher anyway. "I think a
nefarious person will find more juicy stuff on Google," he says.
Weblinks
* [19]http://www.zoominfo.com
* [21]http://searchenginewatch.com/
* [23]http://people.yahoo.com/
* [25]http://froogle.google.com/
More information about the paleopsych
mailing list