Thoughts On Machine Learning Accuracy

This blog shares some brief thoughts on machine learning accuracy and bias.

Let’s start with some comments about a recent ACLU blog in which they ran a facial recognition trial. Using Rekognition, the ACLU built a face database using 25,000 publicly available arrest photos and then performed facial similarity searches on that database using public photos of all current members of Congress. They found 28 incorrect matches out of 535, using an 80% confidence level; this is a 5% misidentification (sometimes called ‘false positive’) rate and a 95% accuracy rate. The ACLU has not published its data set, methodology, or results in detail, so we can only go on what they’ve publicly said. But, here are some thoughts on their claims:

The default confidence threshold for facial recognition APIs in Rekognition is 80%, which is good for a broad set of general use cases (such as identifying celebrities on social media or family members who look alike in a photo apps), but it’s not the right setting for public safety use cases. The 80% confidence threshold used by the ACLU is far too low to ensure the accurate identification of individuals; we would expect to see false positives at this level of confidence. We recommend 99% for use cases where highly accurate face similarity matches are important (as indicated in our public documentation). To illustrate the impact of confidence threshold on false positives, we ran a test where we created a face collection using a dataset of over 850,000 faces commonly used in academia. We then used public photos of all members of US Congress (the Senate and House) to search against this collection in a similar way to the ACLU blog. When we set the confidence threshold at 99% (as we recommend in our documentation), our misidentification rate dropped to zero despite the fact that we are comparing against a larger corpus of faces (30x larger than ACLU test). This illustrates how important it is for those using the technology for public safety issues to pick appropriate confidence levels, so they have few (if any) false positives.
In real-world public safety and law enforcement scenarios, Amazon Rekognition is almost exclusively used to help narrow the field and allow humans to expeditiously review and consider options using their judgment (and not to make fully autonomous decisions), where it can help find lost children, fight against human trafficking, or prevent crimes. Rekognition is generally only the first step in identifying an individual. In other use cases (such as social media), there isn’t the same need to double check so confidence thresholds can be lower.
In addition to setting the confidence threshold far too low, the Rekognition results can be significantly skewed by using a facial database that is not appropriately representative that is itself skewed. In this case, ACLU used a facial database of mugshots that may have had a material impact on the accuracy of Rekognition findings.
The advantage of a cloud-based machine learning application like Rekognition is that it is constantly improving. As we continue to improve the algorithm with more data, ur customers immediately get the benefit of those improvements. We continue to focus on our mission of making Rekognition the most accurate and powerful tool for identifying people, objects, and scenes – and that certainly includes ensuring that the results are free of any bias that impacts accuracy. We’ve been able to add a lot of value for customers and the world at large already with Rekognition in the fight against human trafficking, reuniting lost children with their families, reducing fraud for mobile payments, and improving security, and we’re excited about continuing to help our customers and society with Rekognition in the future.
There is a general misconception that people can match faces to photos better than machines. In fact, the National Institute for Standards and Technology (“NIST”) recently shared a study of facial recognition technologies (technologies that are at least two years behind the state of the art used in Rekognition) and concluded that even those older technologies can outperform human facial recognition abilities.

A final word about the misinterpreted ACLU results. When there are new technological advances, we all have to clearly understand what’s real and what’s not. There’s a difference between using machine learning to identify a food object and using machine learning to determine whether a face match should warrant considering any law enforcement action. The latter is serious business and requires much higher confidence levels. We continue to recommend that customers do not use less than 99% confidence levels for law enforcement matches, and then to only use the matches as one input across others that make sense for each agency. But, machine learning is a very valuable tool to help law enforcement agencies, and while being concerned it’s applied correctly, we should not throw away the oven because the temperature could be set wrong and burn the pizza. It is a very reasonable idea, however, for the government to weigh in and specify what temperature (or confidence levels) it wants law enforcement agencies to meet to assist in their public safety work.