[ExI] How Watson works, a guess

Thu Feb 17 09:01:50 UTC 2011

There have been a number of guesses on list as to how Watson works. I
have spent a fair amount of time looking at everything I can find on
the topic, and here is my guess as to how it works based on what I've
heard and read weighted somewhat by how I would approach the problem.
If I were going to try and duplicate Watson, this is more or less how
I would proceed.

To avoid confusion, I won't reverse question/answer like Jeopardy.

First, keywords from the question are used to find a large number of
potential good answers. This is what is meant when people say "a
search engine is part of Watson". This is likely based on proximity as
Richard pointed out, and this is clearly just the first step. There
are probably some really interesting indexing techniques used that are
a bit different than Google. I was fascinated by the report on this
list that Watson had more RAM than hard drive space. Can someone
verify that this is the case? It seems counter-intuitive. What happens
if you turn the power off? Do you have to connect Watson to a network
to reload all that RAM? Watson's database consists of a reported
200,000,000 documents including wikipedia, IMDB, other encyclopedias,
etc.

Second, a very large set of heuristic algorithms (undoubtedly, these
are the majority of the reported 1,000,000 lines of code) analyze the
Question, The Category and/or each potential answer in combination and
come up with a "score" indicating whether by this heuristic measure
the answer is a good one. I would suspect that each heuristic also
generates a "confidence" measurement.

Third, a learning algorithm generates "weights" to apply to each
heuristic result and perhaps a different weight for each confidence
measurement. This may be the "tuning" that is specific to Jeopardy.
Another part of tuning is adding more heuristic tests. For example, on
the NOVA show, two of the programmers talk about the unfinished
"gender" module that comes up after Watson misses. There is also a
module referred to as the "geographical" element. One could assume it
tried to determine by a variety of algorithms whether what is being
proposed as an answer makes spacial sense.

The heuristic algorithms no doubt include elements of natural language
processing, statistical analysis, hard coded things that were noted by
some programmer or other based on a failed answer during testing, etc.
The reason that the reports of how Watson works are so seemingly
complex and contradictory are, IMO because someone talks about a
particular heuristic, and that makes that heuristic seem a bit more
important than the overall architecture.

The combination of all the weighted scores probably follows some kind
of statistical (probably Bayesian) approach, which are quite amenable
to learning feedback.

An open source project Spam Assassin, takes a similar approach to
determining Spam from good email. Hundreds of heuristic tests are run
on each email, and the results are combined to form a confidence about
an email being Spam or not. A cut off point is determined, and
anything above the cutoff is considered spam. It can "learn" to
distinguish new Spam by changing the weights used for each heuristic
test. It is also an extensible plug-in architecture, in that new
heuristics can be added, and the weights can be tweaked over time as
the nature of various Spams change. I would not be surprised if Watson
takes a similar approach, based on what people have said.

I suspect that each potential answer is evaluated by these heuristic
algorithms on the 2800 processors, and that good answers from multiple
sources (the multiple sources thing could be part of the heuristics)
are given credence. This is why questions about terrorism lead to
incorrect answers about 9/11.

All the results are put together, and all the confidences are
combined, and the winning answer is chosen. If the confidence in the
best answer is not above a threshold, Watson does not push the button.
Quickly. In fact, one of Watson's advantages may be in its ability to
push the button very quickly. I haven't done an analysis, but it
seemed that there weren't many times that Watson had a high confidence
answer and he didn't get the first chance at answering the question.
This is an area where a computer has a serious (and somewhat unfair)
advantage over humans. I understand from a long ago interview that Ken
Jennings basically tries to push the button as fast as he can if he
thinks he might know the answer, even if he hasn't yet fished up the
whole answer. He wasn't the first to press the button a lot of the
time in this tournament. I bet that both the carbon based guys knew a
lot of answers that they didn't get a chance to answer because Watson
is a fast button pusher.

There seems to be another subsystem that determines what to bet. The
non-round numbers are funny, but I would bet that's one of the more
solid elements of Watson's game. I don't think there is any AI in this
part.

Again, this is all just a wild semi-educated guess.

If you have gotten this far, which do you think is more intelligent
Google or Watson? Why? Which leverages human intelligence better?

-Kelly