[Paleopsych] NS: Why your brain has a Jennifer Aniston cell

Thu Jun 23 14:31:59 UTC 2005

Why your brain has a Jennifer Aniston cell
http://www.newscientist.com/article.ns?id=dn7567&print=true

[The articles from Nature are appended. I can supply the PDFs.]

      * 19:00 22 June 2005
      * Anna Gosline

    Obsessed with reruns of the TV sitcom Friends? Well then you probably
    have at least one Jennifer Aniston cell in your brain, suggests
    research on the activity patterns of single neurons in memory-linked
    areas of the brain. The results point to a decades-old and dismissed
    theory tying single neurons to individual concepts and could help
    neuroscientists understand the elusive human memory.

    For things that you see over and over again, your family, your
    boyfriend, or celebrities, your brain wires up and fires very
    specifically to them. These neurons are very, very specific, much more
    than people think, says Christof Koch at the California Institute of
    Technology in Pasadena, US, one of the researchers.

    In the 1960s, neuroscientist Jerry Lettvin suggested that people have
    neurons that respond to a single concept such as, for example, their
    grandmother. The notion of these hyper-specific neurons, coined
    grandmother cells was quickly rejected by psychologists as laughably
    simplistic.

    But Rodrigo Quiroga, at the University of Leicester, UK, who led the
    new study, and his colleagues have found some very grandmother-like
    cells. Previous unpublished findings from the team showed tantalising
    results: a neuron that fired only in response to pictures of former US
    president Bill Clinton, or another to images of the Beatles. But for
    such grandmother cells to exist, they must invariably respond to the
    concept of Bill Clinton, not just similar pictures.

Wired up, fired up

    To investigate further, the team turned to eight patients currently
    undergoing treatment for epilepsy. In an attempt to locate the brain
    areas responsible for their seizures, each patient had around 100 tiny
    electrodes implanted in their brain. Many of the wires were placed in
    the hippocampus - an area of the brain vital to long-term memory
    formation.

    They first gave each subject a screening test, showing them between 71
    and 114 images of famous people, places, and even food items. For each
    subject, the researchers measured the electrical activity or firing of
    the neurons connected to the electrodes. Of the 993 neurons sampled,
    132 fired to at least one image.

    The team then went back for a testing phase, this time showing
    participants three to seven different pictures of the initial 132
    photo subjects that hit. For example, one woman saw seven different
    photos of the Jennifer Aniston alongside 80 other photos of animals,
    buildings or additional famous people such as Julia Roberts. The
    neuron almost ignored all other photos, but fired steadily each time
    Aniston appeared on screen.

Conceptual connections

    The team found similar results with another woman who had a neuron for
    pictures of Halle Berry, including a drawing of her face and an image
    of just the words of her name. This neuron is responding to the
    concept, the abstract entity, of Halle Berry, says Quiroga. If you
    show a line drawing or a profile, its the same response. We also
    showed pictures of her as Catwoman, and you can hardly see her because
    of the mask. But if you know it is Halle Berry then the neurons still
    fire.

    Given more time and an exhaustive list of images, the team may well
    have landed upon other images that spiked the activity of the Halle
    Berry neuron. In one participant, the Jen neuron also fired in
    response to a picture of her former Friends cast-mate, Lisa Kudrow.
    The pattern suggests that the actresses are tied together in the
    memory associations of this particular woman, says Charles Connor, a
    neuroscientist at Johns Hopkins University in Baltimore, US.

    These object-specific neurons may be at the core of how we make
    memories, say Connor. I think thats the excitement to these results,
    he says. You are looking at the far end of the transformation from
    metric, visual shapes to conceptual memory-related information. It is
    that transformation that underlies our ability to understand the
    world. Its not enough to see something familiar and match it. Its the
    fact that you plug visual information into the rich tapestry of memory
    that brings it to life.

    Journal reference: Nature (vol 435 p 1102)

Weblinks

Rodrigo Quiroga, University of Leicester
http://www.vis.caltech.edu/~rodri/

Christof Kochs Lab, California Institute of Technology
http://www.klab.caltech.edu/

Charles E Connors Lab, Johns Hopkins University
http://www.mb.jhu.edu/connor.asp

Nature http://www.nature.com

--------

News and Views

Nature 435, 1036-1037 (23 June 2005) | doi: 10.1038/4351036a
Neuroscience:  Friends and grandmothers

Charles E. Connor1

Abstract

How do neurons in the brain represent movie stars, famous buildings and other 
familiar objects? Rare recordings from single neurons in the human brain 
provide a fresh perspective on the question.

'Grandmother cell' is a term coined by J. Y. Lettvin to parody the simplistic 
notion that the brain has a separate neuron to detect and represent every 
object (including one's grandmother)1 . The phrase has become a shorthand for 
invoking all of the overwhelming practical arguments against a one-to-one 
object coding scheme2. No one wants to be accused of believing in grandmother 
cells. But on page 1102 of this issue, Quiroga et al.3 describe a neuron in the 
human brain that looks for all the world like a 'Jennifer Aniston' cell. Ms 
Aniston could well become a grandmother herself someday. Are vision scientists 
now forced to drop their dismissive tone when discussing the neural 
representation of matriarchs?

A more technical term for the grandmother issue is 'sparseness' (Fig. 1 ). At 
earlier stages in the brain's object-representation pathway, the neural code 
for an object is a broad activity pattern distributed across a population of 
neurons, each responsive to some discrete visual feature4. At later processing 
stages, neurons become increasingly selective for combinations of features5 , 
and the code becomes increasingly sparse — that is, fewer neurons are activated 
by a given stimulus, although the code is still population-based6 . Sparseness 
has its advantages, especially for memory, because compact coding maximizes 
total storage capacity, and some evidence suggests that 'sparsification' is a 
defining goal of visual information processing7, 8 . Grandmother cells are the 
theoretical limit of sparseness, where the representation of an object is 
reduced to a single neuron. Figure 1: Sparseness and invariance in neural 
coding of visual stimuli. [Figure 1 : Sparseness and invariance in neural 
coding of visual stimuli. Unfortunately we are unable to provide accessible 
alternative text for this. If you require assistance to access this image, or 
to obtain a text description, please contact npg at nature.com]

The blue and yellow pixel plots represent a hypothetical neural population. 
Each pixel represents a neuron with low (blue) or high (yellow) activity. In 
distributed coding schemes (left column), many neurons are active in response 
to each stimulus. In sparse coding schemes (right column), few neurons are 
active. If the neural representation is invariant (top row), different views of 
the same person or object evoke identical activity patterns. If the neural 
representation is not invariant (bottom row), different views evoke different 
activity patterns. The implication of Quiroga and colleagues' results3, at 
least as far as vision is concerned, is that neural representation is extremely 
sparse and invariant.

High resolution image and legend (107K)

Quiroga and colleagues3 report what seems to be the closest approach yet to 
that limit. They recorded neural activity from structures in the human medial 
temporal lobe that are associated with late-stage visual processing and 
long-term memory. The structures concerned were the entorhinal cortex, the 
parahippocampal gyrus, the amygdala and the hippocampus, and the recordings 
were made in the course of clinical procedures to treat epilepsy.

The first example cell responded significantly to seven different images of 
Jennifer Aniston but not to 80 other stimuli, including pictures of Julia 
Roberts and even pictures of Jennifer Aniston with Brad Pitt. The second 
example cell preferred Halle Berry in the same way. Altogether, 44 units (out 
of 137 with significant visual responses) were selective in this way for a 
single object out of those tested.

The striking aspect of these results is the consistency of responses across 
different images of the same person or object. This relates to another major 
issue in visual coding, 'invariance' (Fig. 1 ). One of the most difficult 
aspects of vision is that any given object must be recognizable from the front 
or side, in light or shadow, and so on. Somehow, given those very different 
retinal images, the brain consistently invokes the same set of memory 
associations that give the object meaning. According to 'view-invariant' 
theories, this is achieved in the visual cortex by some kind of neural 
calculation that transforms the visual structure in different images into a 
common format9, 10, 11 . According to 'view-dependent' theories, it is achieved 
by learning temporal associations between different views and storing those 
associations in the memory12, 13, 14.

Quiroga and colleagues' results3 set a new benchmark for both sparseness and 
invariance, at least from a visual perspective. Most of the invariant 
structural characteristics in images of Jennifer Aniston (such as relative 
positions of eyes, nose and mouth) would be present in images of Julia Roberts 
as well. Thus, any distributed visual coding scheme would predict substantial 
overlap in the neural groups representing Aniston and Roberts; cells responding 
to one and not the other would be rare. The clean, visually invariant 
selectivity of the neurons described by Quiroga et al. implies a sparseness 
bordering on grandmotherliness.

However, as the authors discuss, these results may be best understood in a 
somewhat non-visual context. The brain structures that they studied stand at 
the far end of the object-representation pathway or beyond, and their responses 
may be more memory-related than strictly visual. In fact, several example cells 
responded not only to pictures but also to the printed name of a particular 
person or object. Clearly, this is a kind of invariance based on learned 
associations, not geometric transformation of visual structure, and these cells 
encode memory-based concepts rather than visual appearance.

How do you measure sparseness in conceptual space? It's a difficult 
proposition, requiring knowledge of how the subject associates different 
concepts in memory. The authors did their best (within the constraints of 
limited recording time) to test images that might be conceptually related. In 
one tantalizing example, a neuron responded to both Jennifer Aniston and Lisa 
Kudrow, her co-star on the television show Friends. What seems to be a sparse 
representation in visual space may be a distributed representation in sitcom 
space! In another example, a neuron responded to two unrelated stimuli commonly 
used by Quiroga et al. — pictures of Jennifer Aniston with Brad Pitt and 
pictures of the Sydney Opera House. This could reflect a new memory association 
produced by the close temporal proximity of these stimuli during the recording 
sessions, consistent with similar phenomena observed in monkey temporal 
cortex15.

Thus, Quiroga and colleagues' findings may say less about visual representation 
as such than they do about memory representation and how it relates to visual 
inputs. Quiroga et al. have shown that, at or near the end of the 
transformation from visual information about object structure to memory-related 
conceptual information about object identity, the neural representation seems 
extremely sparse and invariant in the visual domain. As the authors note, these 
are predictable characteristics of an abstract, memory-based representation. 
But I doubt that anyone would have predicted such striking confirmation at the 
level of individual neurons.

References

    1. Rose, D. Perception 25, 881?886 (1996). | PubMed | ChemPort |
    2. Barlow, H. B. Perception 1, 371?394 (1972). | PubMed | ChemPort |
    3. Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C. & Fried, I. Nature 435, 
1102?1107 (2005). | Article |
    4. Pasupathy, A. & Connor, C. E. Nature Neurosci. 5, 1332?1338 (2002). | 
Article | PubMed | ISI | ChemPort |
    5. Brincat, S. L. & Connor, C. E. Nature Neurosci. 7, 880?886 (2004). | 
Article | PubMed | ChemPort |
    6. Young, M. P. & Yamane, S. Science 256, 1327?1331 (1992). | PubMed | ISI | 
ChemPort |
    7. Olshausen, B. A. & Field, D. J. Nature 381, 607?609 (1996). | Article | 
PubMed | ISI | ChemPort |
    8. Vinje, W. E. & Gallant, J. L. Science 287, 1273?1276 (2000). | Article | 
PubMed | ISI | ChemPort |
    9. Biederman, I. Psychol. Rev. 94, 115?147 (1987). | Article | PubMed | ISI 
| ChemPort |
   10. Marr, D. & Nishihara, H. K. Proc. R. Soc. Lond. B 200, 269?294 (1978). | 
PubMed | ISI | ChemPort |
   11. Booth, M. C. & Rolls, E. T. Cereb. Cortex 8, 510?523 (1998). | Article | 
PubMed | ISI | ChemPort |
   12. Bulthoff, H. H., Edelman, S. Y. & Tarr, M. J. Cereb. Cortex 5, 247?260 
(1995). | PubMed | ISI | ChemPort |
   13. Vetter, T., Hurlbert, A. & Poggio, T. Cereb. Cortex 5, 261?269 (1995). | 
PubMed | ISI | ChemPort |
   14. Logothetis, N. K. & Pauls, J. Cereb. Cortex 5, 270?288 (1995). | PubMed | 
ISI | ChemPort |
   15. Sakai, K. & Miyashita, Y. Nature 354, 152?155 (1991). | Article | PubMed 
| ISI | ChemPort |

    1. Charles E. Connor is in the Department of Neuroscience and the Zanvyl 
Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, Maryland 
21218, USA.
       Email: connor at jhu.edu

---------------

Letter

Nature 435, 1102-1107 (23 June 2005) | doi: 10.1038/nature03687
Invariant visual representation by single neurons in the human brain

R. Quian Quiroga1,2,5, L. Reddy1, G. Kreiman3, C. Koch1 and I. Fried2,4

Abstract

It takes a fraction of a second to recognize a person or an object even when 
seen under strikingly different conditions. How such a robust, high-level 
representation is achieved by neurons in the human brain is still unclear1, 2, 
3, 4, 5, 6 . In monkeys, neurons in the upper stages of the ventral visual 
pathway respond to complex images such as faces and objects and show some 
degree of invariance to metric properties such as the stimulus size, position 
and viewing angle2, 4, 7, 8, 9, 10, 11, 12 . We have previously shown that 
neurons in the human medial temporal lobe (MTL) fire selectively to images of 
faces, animals, objects or scenes13, 14 . Here we report on a remarkable subset 
of MTL neurons that are selectively activated by strikingly different pictures 
of given individuals, landmarks or objects and in some cases even by letter 
strings with their names. These results suggest an invariant, sparse and 
explicit code, which might be important in the transformation of complex visual 
percepts into long-term and more abstract memories.

The subjects were eight patients with pharmacologically intractable epilepsy 
who had been implanted with depth electrodes to localize the focus of seizure 
onset. For each patient, the placement of the depth electrodes, in combination 
with micro-wires, was determined exclusively by clinical criteria13 . We 
analysed responses of neurons from the hippocampus, amygdala, entorhinal cortex 
and parahippocampal gyrus to images shown on a laptop computer in 21 recording 
sessions. Stimuli were different pictures of individuals, animals, objects and 
landmark buildings presented for 1 s in pseudo-random order, six times each. An 
unpublished observation in our previous recordings was the sometimes surprising 
degree of invariance inherent in the neuron's (that is, unit's) firing 
behaviour. For example, in one case, a unit responded only to three completely 
different images of the ex-president Bill Clinton. Another unit (from a 
different patient) responded only to images of The Beatles, another one to 
cartoons from The Simpson's television series and another one to pictures of 
the basketball player Michael Jordan. This suggested that neurons might encode 
an abstract representation of an individual. We here ask whether MTL neurons 
can represent high-level information in an abstract manner characterized by 
invariance to the metric characteristics of the images. By invariance we mean 
that a given unit is activated mainly, albeit not necessarily uniquely, by 
different pictures of a given individual, landmark or object.

To investigate further this abstract representation, we introduced several 
modifications to optimize our recording and data processing conditions (see 
Supplementary Information ) and we designed a paradigm to systematically search 
for and characterize such invariant neurons. In a first recording session, 
usually done early in the morning (screening session), a large number of images 
of famous persons, landmark buildings, animals and objects were shown. This set 
was complemented by images chosen after an interview with the patient. The mean 
number of images in the screening session was 93.9 (range 71?114). The data 
were quickly analysed offline to determine the stimuli that elicited responses 
in at least one unit (see definition of response below). Subsequently, in later 
sessions (testing sessions) between three and eight variants of all the stimuli 
that had previously elicited a response were shown. If not enough stimuli 
elicited significant responses in the screening session, we chose those stimuli 
with the strongest responses. On average, 88.6 (range 70?110) different images 
showing distinct views of 14 individuals or objects (range 7?23) were used in 
the testing sessions. Single views of random stimuli (for example, famous and 
non-famous faces, houses, animals, etc) were also included. The total number of 
stimuli was determined by the time available with the patient (about 30 min on 
average). Because in our clinical set-up the recording conditions can sometimes 
change within a few hours, we always tried to perform the testing sessions 
shortly after the screening sessions in order to maximize the probability of 
recording from the same units. Unless explicitly stated otherwise, all the data 
reported in this study are from the testing sessions. To hold their attention, 
patients had to perform a simple task during all sessions (indicating with a 
key press whether a human face was present in the image). Performance was close 
to 100%.

We recorded from a total of 993 units (343 single units and 650 multi-units), 
with an average of 47.3 units per session (16.3 single units and 31.0 
multi-units). Of these, 132 (14%; 64 single units and 68 multi-units) showed a 
statistically significant response to at least one picture. A response was 
considered significant if it was larger than the mean plus 5 standard 
deviations (s.d.) of the baseline and had at least two spikes in the 
post-stimulus time interval considered (300?1,000 ms). All these responses were 
highly selective: for the responsive units, an average of only 2.8% of the 
presented pictures (range: 0.9?22.8%) showed significant activations according 
to this criterion. This high selectivity was also present in the screening 
sessions, where only 3.1% of the pictures shown elicited responses (range: 
0.9?18.0%). There was no significant difference between the relative number of 
responsive pictures obtained in the screening and testing sessions (t-test, P = 
0.40). Responses started around 300 ms after stimulus onset and had mainly 
three non-exclusive patterns of activation (with about one-third of the cells 
having each type of response): the response disappeared with stimulus offset, 1 
s after stimulus onset; it consisted of a rapid sequence of about 6 spikes 
(s.d. = 5) between 300 and 600 ms after stimulus onset; or it was prolonged and 
continued up to 1 s after stimulus offset. For this study, we calculated the 
responses in a time window between 300 and 1,000 ms after stimulus onset. In a 
few cases we also observed cells that responded selectively only after the 
image was removed from view (that is, after 1 s). These are not further 
analysed here.

Figure 1a shows the responses of a single unit in the left posterior 
hippocampus to a selection of 30 out of the 87 pictures presented to the 
patient. None of the other pictures elicited a statistically significant 
response. This unit fired to all pictures of the actress Jennifer Aniston 
alone, but not (or only very weakly) to other famous and non-famous faces, 
landmarks, animals or objects. Interestingly, the unit did not respond to 
pictures of Jennifer Aniston together with the actor Brad Pitt (but see 
Supplementary Fig. 2 ). Pictures of Jennifer Aniston elicited an average of 
4.85 spikes (s.d. = 3.59) between 300 and 600 ms after stimulus onset. Notably, 
this unit was nearly silent during baseline (average of 0.02 spikes in a 700-ms 
pre-stimulus time window) and during the presentation of most other pictures 
(Fig. 1b). Figure 1b plots the median number of spikes (across trials) in the 
300?1,000-ms post-stimulus interval for all 87 pictures shown to the patient. 
The histogram shows a marked differential response to pictures of Jennifer 
Aniston (red bars). Figure 1: A single unit in the left posterior hippocampus 
activated exclusively by different views of the actress Jennifer Aniston. 
[Figure 1 : A single unit in the left posterior hippocampus activated 
exclusively by different views of the actress Jennifer Aniston. Unfortunately 
we are unable to provide accessible alternative text for this. If you require 
assistance to access this image, or to obtain a text description, please 
contact npg at nature.com]

a , Responses to 30 of the 87 images are shown. There were no statistically 
significant responses to the other 57 pictures. For each picture, the 
corresponding raster plots (the order of trial number is from top to bottom) 
and post-stimulus time histograms are given. Vertical dashed lines indicate 
image onset and offset (1 s apart). Note that owing to insurmountable copyright 
problems, all original images were replaced in this and all subsequent figures 
by very similar ones (same subject, animal or building, similar pose, similar 
colour, line drawing, and so on). b, The median responses to all pictures. The 
image numbers correspond to those in a . The two horizontal lines show the mean 
baseline activity (0.02 spikes) and the mean plus 5 s.d. (0.82 spikes). 
Pictures of Jennifer Aniston are denoted by red bars. c, The associated ROC 
curve (red trace) testing the hypothesis that the cell responded in an 
invariant manner to all seven photographs of Jennifer Aniston (hits) but not to 
other images (including photographs of Jennifer Aniston and Brad Pitt together; 
false positives). The grey lines correspond to the same ROC analysis for 99 
surrogate sets of 7 randomly chosen pictures (P < 0.01). The area under the red 
curve is 1.00.

High resolution image and legend (87K)

Next, we quantified the degree of invariance using a receiver operating 
characteristic (ROC) framework15. We considered as the hit rate (y axis) the 
relative number of responses to pictures of a specific individual, object, 
animal or landmark building, and as the false positive rate (x axis) the 
relative number of responses to other pictures. The ROC curve corresponds to 
the performance of a linear binary classifier for different values of a 
response threshold. Decreasing the threshold increases the probability of hits 
but also of false alarms. A cell responding to a large set of pictures of 
different individuals will have a ROC curve close to the diagonal (with an area 
under the curve of 0.5), whereas a cell that responds to all pictures of an 
individual but not to others will have a convex ROC curve far from the 
diagonal, with an area close to 1. In Fig. 1c we show the ROC curve for all 
seven pictures of Jennifer Aniston (red trace, with an area equal to 1). The 
grey lines show 99 ROC surrogate curves, testing invariance to randomly 
selected groups of pictures (see Methods). As expected, these curves are close 
to the diagonal, having an area of about 0.5. None of the 99 surrogate curves 
had an area equal or larger than the original ROC curve, implying that it is 
unlikely (P < 0.01) that the responses to Jennifer Aniston were obtained by 
chance. A responsive unit was defined to have an invariant representation if 
the area under the ROC curve was larger than the area of the 99 surrogate 
curves.

Figure 2 shows another single unit located in the right anterior hippocampus of 
a different patient. This unit was selectively activated by pictures of the 
actress Halle Berry as well as by a drawing of her (but not by other drawings; 
for example, picture no. 87). This unit was also activated by several pictures 
of Halle Berry dressed as Catwoman, her character in a recent film, but not by 
other images of Catwoman that were not her (data not shown). Notably, the unit 
was selectively activated by the letter string 'Halle Berry'. Such an invariant 
pattern of activation goes beyond common visual features of the different 
stimuli. As with the previous unit, the responses were mainly localized between 
300 and 600 ms after stimulus onset. Figure 2c shows the ROC curve for the 
pictures of Halle Berry (red trace) and for 99 surrogates (grey lines). The 
area under the ROC curve was 0.99, larger than that of the surrogates. Figure 
2: A single unit in the right anterior hippocampus that responds to pictures of 
the actress Halle Berry (conventions as in Fig. 1). [Figure 2 : A single unit 
in the right anterior hippocampus that responds to pictures of the actress 
Halle Berry (conventions as in Fig. 1). Unfortunately we are unable to provide 
accessible alternative text for this. If you require assistance to access this 
image, or to obtain a text description, please contact npg at nature.com]

a?c , Strikingly, this cell also responds to a drawing of her, to herself 
dressed as Catwoman (a recent movie in which she played the lead role) and to 
the letter string 'Halle Berry' (picture no. 96). Such an invariant response 
cannot be attributed to common visual features of the stimuli. This unit also 
had a very low baseline firing rate (0.06 spikes). The area under the red curve 
in c is 0.99. High resolution image and legend (88K)

Figure 3 illustrates a multi-unit in the left anterior hippocampus responding 
to pictures of the Sydney Opera House and the Baha'i Temple. Because the 
patient identified both landmark buildings as the Sydney Opera House, all these 
pictures were considered as a single landmark building for the ROC analysis. 
This unit also responded to the letter string 'Sydney Opera' (pictures no. 2 
and 8) but not to other letter strings, such as 'Eiffel Tower' (picture no. 1). 
More examples of invariant responses are shown in the Supplementary Figs 2?11. 
Figure 3: A multi-unit in the left anterior hippocampus that responds to 
photographs of the Sydney Opera House and the Baha'i Temple (conventions as in 
Fig. 1). [Figure 3 : A multi-unit in the left anterior hippocampus that 
responds to photographs of the Sydney Opera House and the Baha'i Temple 
(conventions as in Fig. 1). Unfortunately we are unable to provide accessible 
alternative text for this. If you require assistance to access this image, or 
to obtain a text description, please contact npg at nature.com]

a?c , The patient identified all pictures of both of these buildings as the 
Sydney Opera, and we therefore considered them as a single landmark. This unit 
also responded to the presentation of the letter string 'Sydney Opera' 
(pictures no. 2 and 8), but not to other strings, such as 'Eiffel Tower' 
(picture no. 1). In contrast to the previous two figures, this unit had a 
higher baseline firing rate (2.64 spikes). The area under the red curve in c is 
0.97.

High resolution image and legend (118K)

Out of the 132 responsive units, 51 (38.6%; 30 single units and 21 multi-units) 
showed invariance to a particular individual (38 units responding to Jennifer 
Aniston, Halle Berry, Julia Roberts, Kobe Bryant, and so on), landmark building 
(6 units responding to the Tower of Pisa, the Baha'i Temple and the Sydney 
Opera House), animal (5 units responding to spiders, seals and horses) or 
object (2 units responding to specific food items), with P < 0.01 as defined 
above by means of the surrogate tests. A one-way analysis of variance (ANOVA) 
yielded similar results (see Methods). Eight of these units (two single units 
and six multi-units) responded to two different individuals (or to an 
individual and an object). Figure 4 presents the distribution of the areas 
under the ROC curves for all 51 units that showed an invariant representation 
to individuals or objects. The areas ranged from 0.76 to 1.00, with a median of 
0.94. These units were located in the hippocampus (27 out of 60 responsive 
units; 45%), parahippocampal gyrus (11 out of 20 responsive units; 55%), 
amygdala (8 out of 30 responsive units; 27%) and entorhinal cortex (5 out of 22 
responsive units; 23%). There were no clear differences in the latencies and 
firing patterns among the different areas. However, more data are needed before 
making a conclusive claim about systematic differences between the various 
structures of the MTL. Figure 4: Distribution of the area under the ROC curves 
for the 51 units (out of 132 responsive units) showing an invariant 
representation. [Figure 4 : Distribution of the area under the ROC curves for 
the 51 units (out of 132 responsive units) showing an invariant representation. 
Unfortunately we are unable to provide accessible alternative text for this. If 
you require assistance to access this image, or to obtain a text description, 
please contact npg at nature.com]

Of these, 43 responded to a single individual or object and 8 to two 
individuals or objects. The dashed vertical line marks the median of the 
distribution (0.94).

High resolution image and legend (38K)

As shown in Figs 2 and 3 , one of the most extreme cases of an abstract 
representation is the one given by responses to pictures of a particular 
individual (or object) and to the presentation of the corresponding letter 
string with its name. In 18 of the 21 testing sessions we also tested responses 
to letter strings with the names of the individuals and objects. Eight of the 
132 responsive units (6.1%) showed a selective response to an individual and 
its name (with no response to other names). Six of these were in the 
hippocampus, one was in the entorhinal cortex and one was in the amygdala.

These neuronal responses cannot be attributed to any particular movement 
artefact, because selective responses started around 300 ms after image onset, 
whereas key presses occurred at 1 s or later, and neuronal responses were very 
selective. About one-third of the responsive units had a response localized 
between 300 and 600 ms. This interval corresponds to the latency of 
event-related responses correlated with the recognition of 'oddball' stimuli in 
scalp electroencephalogram, namely, the P300 (ref. 16). Some studies argue for 
a generation of the P300 in the hippocampal formation and amygdala17, 18, 
consistent with our findings.

What are the common features that activate these neurons? Given the great 
diversity of distinct images of a single individual (pencil sketches, 
caricatures, letter strings, coloured photographs with different backgrounds) 
that these cells can selectively respond to, it is unlikely that this degree of 
invariance can be explained by a simple set of metric features common to these 
images. Indeed, our data are compatible with an abstract representation of the 
identity of the individual or object shown. The existence of such high-level 
visual responses in medial temporal lobe structures, usually considered to be 
involved in long-term memory formation and consolidation, should not be 
surprising given the following: (1) the known anatomical connections between 
the higher stages of the visual hierarchy in the ventral pathway and the MTL19, 
20 ; (2) the well-characterized reactivity of the cortical stages feeding into 
the MTL to the sight of faces, objects, or spatial scenes (as ascertained using 
functional magnetic resonance imaging (fMRI) in humans21, 22 and 
electrophysiology in monkeys2, 4, 7, 8, 9, 10, 11 ); and (3) the observation 
that any visual percept that will be consciously remembered later on will have 
to be represented in the hippocampal system23, 24, 25 . This is true even 
though patients with bilateral loss of parts of the MTL do not, in general, 
have a deficit in the perception of images25. Neurons in the MTL might have a 
fundamental role in learning associations between abstract representations26 . 
Thus, our observed invariant responses probably arise from experiencing very 
different pictures, words or other visual stimuli in association with a given 
individual or object.

How neurons encode different percepts is one of the most intriguing questions 
in neuroscience. Two extreme hypotheses are schemes based on the explicit 
representations by highly selective (cardinal, gnostic or grandmother) neurons 
and schemes that rely on an implicit representation over a very broad and 
distributed population of neurons1, 2, 3, 4, 6 . In the latter case, 
recognition would require the simultaneous activation of a large number of 
cells and therefore we would expect each cell to respond to many pictures with 
similar basic features. This is in contrast to the sparse firing we observe, 
because most MTL cells do not respond to the great majority of images seen by 
the patient. Furthermore, cells signal a particular individual or object in an 
explicit manner27 , in the sense that the presence of the individual can, in 
principle, be reliably decoded from a very small number of neurons. We do not 
mean to imply the existence of single neurons coding uniquely for discrete 
percepts for several reasons: first, some of these units responded to pictures 
of more than one individual or object; second, given the limited duration of 
our recording sessions, we can only explore a tiny portion of stimulus space; 
and third, the fact that we can discover in this short time some images—such as 
photographs of Jennifer Aniston—that drive the cells suggests that each cell 
might represent more than one class of images. Yet, this subset of MTL cells is 
selectively activated by different views of individuals, landmarks, animals or 
objects. This is quite distinct from a completely distributed population code 
and suggests a sparse, explicit and invariant encoding of visual percepts in 
MTL. Such an abstract representation, in contrast to the metric representation 
in the early stages of the visual pathway, might be important in the storage of 
long-term memories. Other factors, including emotional responses towards some 
images, could conceivably influence the neuronal activity as well. The 
responses of these neurons are reminiscent of the behaviour of hippocampal 
place cells in rodents28 that only fire if the animal moves through a 
particular spatial location, with the actual place field defined independently 
of sensory cues. Notably, place cells have been found recently in the human 
hippocampus as well29 . Both classes of neurons—place cells and the cells in 
the present study—have a very low baseline activity and respond in a highly 
selective manner. Future research might show that this similarity has 
functional implications, enabling mammals to encode behaviourally important 
features of the environment and to transition between them, either in physical 
space or in a more conceptual space13.

Methods

The data in the present study come from 21 sessions in 8 patients with 
pharmacologically intractable epilepsy (eight right handed; 3 male; 17?47 years 
old). Extensive non-invasive monitoring did not yield concordant data 
corresponding to a single resectable epileptogenic focus. Therefore, the 
patients were implanted with chronic depth electrodes for 7?10 days to 
determine the seizure focus for possible surgical resection13 . Here we report 
data from sites in the hippocampus, amygdala, entorhinal cortex and 
parahippocampal gyrus. All studies conformed to the guidelines of the Medical 
Institutional Review Board at UCLA. The electrode locations were based 
exclusively on clinical criteria and were verified by MRI or by computer 
tomography co-registered to preoperative MRI. Each electrode probe had a total 
of nine micro-wires at its end13 , eight active recording channels and one 
reference. The differential signal from the micro-wires was amplified using a 
64-channel Neuralynx system, filtered between 1 and 9,000 Hz. We computed the 
power spectrum for every unit after spike sorting. Units that showed evidence 
of line noise were excluded from subsequent analysis14. Signals were sampled at 
28 kHz. Each recording session lasted about 30 min.

Subjects lay in bed, facing a laptop computer. Each image covered about 1.5° 
and was presented at the centre of the screen six times for 1 s. The order of 
the pictures was randomized. Subjects had to respond, after image offset, 
according to whether the picture contained a human face or something else by 
pressing the 'Y' and 'N' keys, respectively. This simple task, on which 
performance was virtually flawless, required them to attend to the pictures. 
After the experiments, patients gave feedback on whether they recognized the 
images or not. Pictures included famous and unknown individuals, animals, 
landmarks and objects. We tried to maximize the differences between pictures of 
the individuals (for example, different clothing, size, point of view, and so 
on). In 18 of the 21 sessions, we also presented letter strings with names of 
individuals or objects.

The data from the screening sessions were rapidly processed to identify 
responsive units and images. All pictures that elicited a response in the 
screening session were included in the later testing sessions. Three to eight 
different views of seven to twenty-three different individuals or objects were 
used in the testing sessions with a mean of 88.6 images per session (range 
70?110). Spike detection and sorting was applied to the continuous recordings 
using a novel clustering algorithm30 (see Supplementary Information ). The 
response to a picture was defined as the median number of spikes across trials 
between 300 and 1,000 ms after stimulus onset. Baseline activity was the 
average spike count for all pictures between 1,000 and 300 ms before stimulus 
onset. A unit was considered responsive if the activity to at least one picture 
fulfilled two criteria: (1) the median number of spikes was larger than the 
average number of spikes for the baseline plus 5 s.d.; and (2) the median 
number of spikes was at least two.

The classification between single unit and multi-unit was done visually based 
on the following: (1) the spike shape and its variance; (2) the ratio between 
the spike peak value and the noise level; (3) the inter-spike interval 
distribution of each cluster; and (4) the presence of a refractory period for 
the single units (that is, less than 1% of spikes within less than 3 ms 
inter-spike interval).

Whenever a unit had a response to a given stimulus, we further analysed the 
responses to other pictures of the same individual or object by a ROC analysis. 
This tested whether cells responded selectively to pictures of a given 
individual. The hit rate (y axis) was defined as the number of responses to the 
individual divided by the total number of pictures of this individual. The 
false positive rate (x axis) was defined as the number of responses to the 
other pictures divided by the total number of other pictures. The ROC curve was 
obtained by gradually lowering the threshold of the responses (the median 
number of spikes in Figs 1b, 2b and 3b ). Starting with a very high threshold 
(no hits, no false positives, lower left-hand corner in the ROC diagram), if a 
unit responds exclusively to an image of a particular individual or object, the 
ROC curve will show a steep increase when lowering the threshold (a hit rate of 
1 and no false positives). If a unit responds to a random selection of 
pictures, it will have a similar relative number of hits and false positives 
and the ROC curve will fall along the diagonal. In the first case, for a highly 
invariant unit, the area under the ROC curve will be close to 1, whereas in the 
latter case it will be about 0.5. To evaluate the statistical significance, we 
created 99 surrogate curves for each responsive unit, testing the null 
hypothesis that the unit responded preferentially to n randomly chosen pictures 
(with n being the number of pictures of the individual for which invariance was 
tested). A unit was considered invariant to a certain individual or object if 
the area under the ROC curve was larger than the area of all of the 99 
surrogates (that is, with a confidence of P < 0.01). Alternatively, the ROC 
analysis can be done with the single trial responses instead of the median 
responses across trials. Here, responses to the trials corresponding to any 
picture of the individual tested are considered as hits and responses to trials 
to other pictures as false positives. This trial-by-trial analysis led to very 
similar results, with 55 units of all 132 responsive units showing an invariant 
representation. A one-way ANOVA also yielded similar results. In particular, we 
tested whether the distribution of median firing rates for all responsive units 
showed a dependence on the factor identity (that is, the individual, landmark 
or object shown). The different views of each individual were the repeated 
measures. As with the ROC analysis, an ANOVA test was performed on all 
responsive units. Overall, the results were very similar to those obtained with 
the ROC analysis: of 132 responsive units, 49 had a significant effect for 
factor identity with P < 0.01, compared to 51 units showing an invariant 
representation with the ROC analysis. The ANOVA analysis, however, does not 
demonstrate that the invariant responses were very selective, whereas the ROC 
analysis explicitly tests the presence of an invariant as well as sparse 
representation.

Images were obtained from Corbis and Photorazzi, with licensed rights to 
reproduce them in this paper and in the Supplementary Information.

Acknowledgments

We thank all patients for their participation; P. Sinha for drawing some faces; 
colleagues for providing pictures; I. Wainwright for administrative assistance; 
and E. Behnke, T. Fields, E. Ho, E. Isham, A. Kraskov, P. Steinmetz, I. 
Viskontas and C. Wilson for technical assistance. This work was supported by 
grants from the NINDS, NIMH, NSF, DARPA, the Office of Naval Research, the W.M. 
Keck Foundation Fund for Discovery in Basic Medical Research, a Whiteman 
fellowship (to G.K.), the Gordon Moore Foundation, the Sloan Foundation, and 
the Swartz Foundation for Computational Neuroscience. Competing interests 
statement:

The authors declared no competing interests.

Supplementary information accompanies this paper.

References

    1. Barlow, H. Single units and sensation: a neuron doctrine for perception. 
Perception 1, 371?394 (1972) | PubMed | ChemPort |
    2. Gross, C. G., Bender, D. B. & Rocha-Miranda, C. E. Visual receptive 
fields of neurons in inferotemporal cortex of the monkey. Science 166, 
1303?1306 (1969) | PubMed | ISI | ChemPort |
    3. Konorski, J. Integrative Activity of the Brain (Univ. Chicago Press, 
Chicago, 1967)
    4. Logothetis, N. K. & Sheinberg, D. L. Visual object recognition. Annu. 
Rev. Neurosci. 19, 577?621 (1996) | Article | PubMed | ISI | ChemPort |
    5. Riesenhuber, M. & Poggio, T. Neural mechanisms of object recognition. 
Curr. Opin. Neurobiol. 12, 162?168 (2002) | Article | PubMed | ISI | ChemPort |
    6. Young, M. P. & Yamane, S. Sparse population coding of faces in the 
inferior temporal cortex. Science 256, 1327?1331 (1992) | PubMed | ISI | 
ChemPort |
    7. Logothetis, N. K., Pauls, J. & Poggio, T. Shape representation in the 
inferior temporal cortex of monkeys. Curr. Biol. 5, 552?563 (1995) | Article | 
PubMed | ISI | ChemPort |
    8. Logothetis, N. K. & Pauls, J. Psychophysical and physiological evidence 
for viewer-centered object representations in the primate. Cereb. Cortex 3, 
270?288 (1995)
    9. Perrett, D., Rolls, E. & Caan, W. Visual neurons responsive to faces in 
the monkey temporal cortex. Exp. Brain Res. 47, 329?342 (1982) | Article | 
PubMed | ISI | ChemPort |
   10. Schwartz, E. L., Desimone, R., Albright, T. D. & Gross, C. G. Shape 
recognition and inferior temporal neurons. Proc. Natl Acad. Sci. USA 80, 
5776?5778 (1983) | PubMed | ChemPort |
   11. Tanaka, K. Inferotemporal cortex and object vision. Annu. Rev. Neurosci. 
19, 109?139 (1996) | Article | PubMed | ISI | ChemPort |
   12. Miyashita, Y. & Chang, H. S. Neuronal correlate of pictorial short-term 
memory in the primate temporal cortex. Nature 331, 68?71 (1988) | Article | 
PubMed | ISI | ChemPort |
   13. Fried, I., MacDonald, K. A. & Wilson, C. Single neuron activity in human 
hippocampus and amygdale during recognition of faces and objects. Neuron 18, 
753?765 (1997) | Article | PubMed | ISI | ChemPort |
   14. Kreiman, G., Koch, C. & Fried, I. Category-specific visual responses of 
single neurons in the human medial temporal lobe. Nature Neurosci. 3, 946?953 
(2000) | Article | PubMed | ISI | ChemPort |
   15. Macmillan, N. A. & Creelman, C. D. Detection Theory: A User's Guide 
(Cambridge Univ. Press, New York, 1991)
   16. Picton, T. The P300 wave of the human event-related potential. J. Clin. 
Neurophysiol. 9, 456?479 (1992) | PubMed | ISI | ChemPort |
   17. Halgren, E., Marinkovic, K. & Chauvel, P. Generators of the late 
cognitive potentials in auditory and visual oddball tasks. Electroencephalogr. 
Clin. Neurophysiol. 106, 156?164 (1998) | Article | PubMed | ISI | ChemPort |
   18. McCarthy, G., Wood, C. C., Williamson, P. D. & Spencer, D. D. 
Task-dependent field potentials in human hippocampal formation. J. Neurosci. 9, 
4253?4268 (1989) | PubMed | ISI | ChemPort |
   19. Saleem, K. S. & Tanaka, K. Divergent projections from the anterior 
inferotemporal area TE to the perirhinal and entorhinal cortices in the macaque 
monkey. J. Neurosci. 16, 4757?4775 (1996) | PubMed | ISI | ChemPort |
   20. Suzuki, W. A. Neuroanatomy of the monkey entorhinal, perirhinal and 
parahippocampal cortices: Organization of cortical inputs and interconnections 
with amygdale and striatum. Seminar Neurosci. 8, 3?12 (1996)
   21. Kanwisher, N., McDermott, J. & Chun, M. M. The fusiform face area: A 
module in human extrastriate cortex specialized for face perception. J. 
Neurosci. 17, 4302?4311 (1997) | PubMed | ISI | ChemPort |
   22. Haxby, J. V. et al. Distributed and overlapping representations of faces 
and objects in ventral temporal cortex. Science 293, 2425?2430 (2001) | Article 
| PubMed | ISI | ChemPort |
   23. Eichenbaum, H. A cortical-hippocampal system for declarative memory. 
Nature Rev. Neurosci. 1, 41?50 (2000) | Article | PubMed | ISI | ChemPort |
   24. Hampson, R. E., Pons, P. P., Stanford, T. R. & Deadwyler, S. A. 
Categorization in the monkey hippocampus: A possible mechanism for encoding 
information into memory. Proc. Natl Acad. Sci. USA 101, 3184?3189 (2004) | 
Article | PubMed | ChemPort |
   25. Squire, L. R., Stark, C. E. L. & Clark, R. E. The medial temporal lobe. 
Annu. Rev. Neurosci. 27, 279?306 (2004) | Article | PubMed | ISI | ChemPort |
   26. Mishashita, Y. Neuronal correlate of visual associative long-term memory 
in the primate temporal cortex. Nature 335, 817?820 (1988) | Article | PubMed | 
ISI | ChemPort |
   27. Koch, C. The Quest for Consciousness: A Neurobiological Approach 
(Roberts, Englewood, Colorado, 2004)
   28. Wilson, M. A. & McNaughton, B. L. Dynamics of the hippocampal ensemble 
code for space. Science 261, 1055?1058 (1993) | PubMed | ISI | ChemPort |
   29. Ekstrom, A. D. et al. Cellular networks underlying human spatial 
navigation. Nature 425, 184?187 (2003) | Article | PubMed | ISI | ChemPort |
   30. Quian Quiroga, R., Nadasdy, Z. & Ben-Shaul, Y. Unsupervised spike 
detection and sorting with wavelets and super-paramagnetic clustering. Neural 
Comput. 16, 1661?1687 (2004) | PubMed |

    1. Computation and Neural Systems, California Institute of Technology, 
Pasadena, California 91125, USA
    2. Division of Neurosurgery and Neuropsychiatric Institute, University of 
California, Los Angeles (UCLA), California 90095, USA
    3. Brain and Cognitive Sciences, Massachusetts Institute of Technology, 
Cambridge, Massachusetts 02142, USA
    4. Functional Neurosurgery Unit, Tel-Aviv Medical Center and Sackler Faculty 
of Medicine, Tel-Aviv University, Tel-Aviv 69978, Israel
    5. †Present address: Department of Engineering, University of Leicester, LE1 
7RH, UK

Correspondence to: R. Quian Quiroga1,2,5 Correspondence and request for 
materials should be addressed to R.Q.Q. (Email: rodri at vis.caltech.edu).

Received 1 December 2004; Accepted 3 February 2005