[Paleopsych] New Scientist: Automated web-crawler harvests resume info

Premise Checker checker at panix.com
Tue Mar 29 15:50:07 UTC 2005


Automated web-crawler harvests resume info
http://www.newscientist.com/article.ns?id=dn7181&print=true
      * 18:34 21 March 2005

    A new search engine focused on people can automatically identify
    online information on individuals and weave it into detailed
    summaries.

    Just like Google and Yahoo, ZoomInfo crawls and indexes the web. But
    instead of serving up the pages in response to a query, it attempts to
    identify and extract specific information on people.

    After entering a name into the search box, a user is presented with a
    list of matching individuals. Clicking through to their resume-like
    summaries, can reveal their job title, company name, past jobs and
    universities attended.

    The site is free to use and went live on Monday. It will be
    particularly useful to head-hunters, recruiters, journalists and
    networkers, says ZoomInfo's chief scientist Michel Décary, based in
    Cambridge, Massachusetts, US. In future it may serve up paid
    advertising as well as query responses but right now he says its
    purpose is to find more subscribers for the company's premium search,
    which charges recruiters $1000 a month.

    "I don't know anyone that has tried to do people search in an
    automated fashion," says Danny Sullivan, the UK-based editor of search
    industry news website Searchenginewatch.com. Existing people-finding
    search tools such as Yahoo! People Search and Intelius are indexed
    manually but automating the process means that a lot more information
    can be searched and presented to the users, says Décary.

Verbs and nouns

    Shopping websites such as Froogle and Shopping.com already extract
    prices automatically from online stores. But this is much easier than
    figuring out what someone does and where they work from a mixture of
    company websites, news articles and press releases, says Décary.

    InfoZoom deploys algorithms that pick out verbs and proper nouns to
    home in on names, he says. The algorithms also infer context to weed
    out phrases that appear to be real people, such as Penny Lane and
    Harry Potter. Potential new information is also compared to databases
    of known names, job titles, degrees and universities.

    Inferring context also enables ZoomInfo to aggregate information found
    in several places that applies to the same person, and to separate out
    different people who share the same name, says Décary.

Presidential confusion

    Although the search engine currently boasts an index of 25 million
    people, many of their summaries are incomplete and inaccurate, says
    Sullivan. President George W Bush is listed as the British Prime
    Minister, the governor of Florida and the Governor of Massachusetts,
    as well as the president of the US, while some New Scientist employees
    are listed as editors at a company called "Scientist".

    "The mistakes you see point to the difficulty of the task and not the
    sloppiness of the technology," explains Décary. He says that actors,
    celebrities and journalists are much harder to index than CEOs and
    engineers, because their names appear on many different pages and in a
    variety of different contexts.

    Privacy experts have criticised the technology for aggregating
    information about people without their consent. But Décary says that
    the information collected only relates to employment and education and
    is freely available online to a determined searcher anyway. "I think a
    nefarious person will find more juicy stuff on Google," he says.

Weblinks

      * [19]http://www.zoominfo.com
      * [21]http://searchenginewatch.com/
      * [23]http://people.yahoo.com/
      * [25]http://froogle.google.com/


More information about the paleopsych mailing list