[ExI] Character Recognition (was Re: AI milestones)

Tue Mar 6 20:04:57 UTC 2012

2012/3/2 Stefano Vaj <stefano.vaj at gmail.com>:
> On 1 March 2012 06:51, Kelly Anderson <kellycoinguy at gmail.com> wrote:
>>
>> There are commercial OCR engines that scan well over 99% accuracy for
>> typeset text...
>
>
> This is an impressive achievement, and sounds great, but let us keep in mind
> that 99% in OCR accuracy means more than one mistype each two lines, some of
> which cannot be resolved on the basis of the context alone.

I know. Approximately 80 characters per line. About ten of which are
spaces, means 99% recognition rate is unusable for many (though not
all) applications.

Little known fact,  the title of Kelly Anderson's absolutely
fantastic, but never finished/published/defended, Master's Thesis was
"Character Recognition in the Context of Forms", so while I am
hopelessly out of date, last published paper in character recognition
was in SPIE in 1991 (or so)... this is an area that I am pretty
familiar with.

> I was once a testimonial, as a semi-famous lawyer, for IBM Voice Type
> Dictation, and the dictation system in comparison have the advantage of
> recognising not single phonemes, but basically entire words, and on the
> basis of the preceding and subsequent ones ("trigrammes"). OTOH, mistakes
> accordingly become harder to identify, especially if one does not so on the
> fly.

This might have been using Markov chains.. a really primitive kind of
limited context. When I say primitive, I mean simple to implement, not
ineffective. Any kind of context is absolutely critical to getting
good rates of character recognition and Markov chains is a good way to
easily put in a little context.

To see how this works, consider that in English, the letter Q is
nearly always followed by the letter U. Similarly, you can process a
large amount of typical text of the type you are processing... and
create a probability table for each letter digram in English. So the
Q->U entry in the table (Q followed by U) would contain a number like
97%, that is, this is much more likely than QA (Qadi, Qat, Qabala...)
and QW occurs only in the word Qwerty, and QK doesn't happen at all,
ever, in English. So the Q->K entry contains 0%.

With this information, I can now compute an additional probability
function that helps you to know if you are recognizing FORM, F0RM or
FQRM. So I don't need a full dictionary to know which word is probably
right. (+1 if you guessed FORM.) Markov chains can be computed
separately for the first two letters of English words, the last two
letters, and the middle of words probabilities for even more fine
control.

Now, the next level of context is a dictionary, but as anyone doing a
ReCapcha lately knows, there are a lot of words in old newspapers that
won't be in the typical dictionary. Especially proper names.

You can also do Markov chains for word pairs. You are very unlikely to
run into the word pair "

Above that, you have the context of grammar. Running a grammar checker
on recognized text to make sure it's good English would probably be
fairly effective on old newspapers. The incidence of grammar errors in
old New York Times is probably relatively low compared to general
text. See a grammar error, take it up to the next level. For example,
in Italian text recognition, you can check for gender agreement
between articles and the following text. Does every sentence have a
verb and a subject, for example.

The highest level of context for text is doing actual text
understanding. That is, trying to tell what the text means, and does
it make sense. And, would it make more sense if this word were
actually this other word... We do that all the time in processing
misspelled words. We know what they meant because of the high level of
redundancy in written English (or any human language).

We humans are very good at text recognition because of the context. If
you present humans with the text out of context, they do little better
than computers with no context. Give humans the context, and they are
much better than humans.

So, yes, context is critical to performant text recognition by
computers and people. You would not be a very good recognizer of hand
written Chinese, even though you are a human.

The recent discussions of Go, and the discussions of Watson last year,
as well as autonomous vehicles...  make me wonder if it isn't time to
bump text recognition up to the next level. I think it is very
possible. Someone is probably doing it.

In my master's thesis, I was recognizing numbers on 1040 forms. If the
numbers don't add up, then the probability that something was
misrecognized went up... and further processing was done. With the
right context, I got up to 100% accurate recognition of one field of
hand printed text. Granted, it was a special case. :-) But that's the
point.

-Kelly