<html><head></head><body><div><span data-mailaddress="firstname.lastname@example.org" data-contactname="BillK" class="clickable"><span title="email@example.com">BillK</span><span class="detail"> <firstname.lastname@example.org></span></span> , 1/6/2014 9:22 PM:<br><blockquote class="mori" style="margin:0 0 0 .8ex;border-left:2px blue solid;padding-left:1ex;"><br>And, of course, as sure as night follows day......
<br><<a href="http://mobile.nytimes.com/2014/06/01/us/nsa-collecting-millions-of-faces-from-web-images.html" target="_blank" title="http://mobile.nytimes.com/2014/06/01/us/nsa-collecting-millions-of-faces-from-web-images.html">http://mobile.nytimes.com/2014/06/01/us/nsa-collecting-millions-of-faces-from-web-images.html</a>>
<br>The National Security Agency is harvesting huge numbers of images of
<br>people from communications that it intercepts through its global
<br>surveillance operations for use in sophisticated facial recognition
<br>programs, according to top-secret documents. </blockquote></div><div><br></div>Of course. They have been doing it for a while now. <div><br></div><div>However, facial recognition has different uses. Face verification, looking for one person in a database of images (or many cameras) is very different from identifying who is who, face recognition. </div><div><br></div><div>The first is in principle doable: if you have 98.52% chance of correct identification and a thousand images where the person shows up in ten, you should expect nearly ten out of ten hits. The false positive rate, given the ROC curve in the paper, is about 1%. So you would also get 10 false positives. This is manageable for this example, as some other characteristic or a human could separate them. For a lot of pictures things get worse: if the target appears in a fraction f of N pictures there will be 0.9852fN correct hits, but 0.01N false positives. If f is smaller than one in hundred the false positives will outweigh the true positives - and potentially by a huge factor (just imagine Facebook: N=300 million pictures per day). </div><div><br></div><div>The second case of face recognition is worse: now you have to repeat this for every person in the set N. In 1.48% of the cases there will be no match, and in 1% a false positive as person A is identified as B. So in the end, there will be 2.48% errors in the identification: 25 of those 1000 pictures will be wrongly assigned. In general recognition is also far harder when you have large probe sets; looking for person A has a bigger accuracy than A to Z.</div><div><div><br></div><div>This will not stop NSA, Facebook or anybody else from trying. In many applications a few false positives are not a big deal - advertisers can handle noisy data. However, sending SWAT teams to every place where Most Wanted du Jour appears is problematic. Same thing with false negatives: no problem for the advertiser, a big problem when trying to enter your high security Lair of Doom. The real solution is data fusion: combine the images with gait analysis, keyboard rhythm, stylometrics, voiceprints and whatever sensors you have, do a Bayesian estimate, and you have something fairly robust. I fully expect NSA to do the 21st century version of Stasi archival: try to get as much data as possible, one day it will be all possible to weigh into a probability map. Shame about those errors that cause false positives even in such systems...</div><div><br>Anders Sandberg, Future of Humanity Institute Philosophy Faculty of Oxford University</div></div></body></html>