[extropy-chat] Re: vision research, was: T-shirts? Intro post

Sat Dec 11 23:37:01 UTC 2004

On Sat, 11 Dec 2004 12:00:19 -0700,
extropy-chat-request at lists.extropy.org
<extropy-chat-request at lists.extropy.org> wrote:
> Hmm.  Well, I can invite you over to
> http://groups.yahoo.com/group/howtobuildaspacehabitat/
> for the latter.  

Thanks for the link!

> As to the former...have you heard
> much about the work being done on neural net modelling
> of human vision, including higher-level processes like
> object recognition, and if so could you give a digest
> of the current state of the art?  This would seem to
> be one of the things likely to lead towards computer
> modelling of the entire human brain, not to mention
> its more immediate applications.

I'm only beginning to familiarize myself with the vast literature on
the subject, but I can take a stab at it. In general, we have a
reasonable understanding of the overall architecture, and have many
good models of individual circuits for things like attention and
certain aspects of object recognition.

One particular interest of mine recently is David Lowe's SIFT
(scale-invariant feature transform), an object recognition algorithm
loosely based on our understanding of visual cortex and the inferior
temporal (IT) area. In a gist, the algorithm operates by identifying
scale-invariant features in an image, and identifying objects (as well
as their relative position and orientation) by detecting memorized
constellations of such features. This particular algorithm is fast and
invariant to scale, rotation, and a certain amount of affine
transformation. One major problem with it though is that it doesn't do
any sort of object segmentation, but instead identifies objects based
on the entire image.

On a related note, here's a neat utility which uses SIFT to find
common feature keypoints between images, in order to construct
panoramas: http://user.cs.tu-berlin.de/~nowozin/autopano-sift/

However, even though we have many great models of many areas of the
visual system, we're still somewhat limited in our understanding of
how different areas interact with each other. People are slowly but
surely making progress with this, though. For example, some students
in my department (actually in the lab I'm rotating in next term) have
done some great work by using attentional maps to select relevant
parts of an image, then using SIFT to learn and identify objects in
attended areas. This results in a system which gets less distracted by
clutter and is better at learning individual objects:
http://www.klab.caltech.edu/cgi-bin/publication/reference-view.pl?refdbname=paper&paper_id=492

Another open problem is that most research so far deals only with
feedforward connections in the visual system, simply because they're
more tractable. However, in many (most?) parts of the visual system,
there are actually more feedback connections than feedforward
connections. I'm not too familiar with the research done on them so
far, but I get the impression that we know fairly little about what
these feedback connections are doing. Hopefully progress in
multielectrode recordings and simulation capabilities will help make
feedback connections more tractable.

Of course, it's impossible for me to really give a full overview of
current research -- I've only commented on a couple of things along
the forefront. If you have questions about any particular aspects of
current research, feel free to ask. I'm not sure if I'm knowledgeable
enough yet to adequately answer such questions, but I can try. :)

-- Neil Halelamien