Enterprise AI For Dummies. Zachary Jarvinen
includes a wealth of digital source data and well-defined categories. If you look back over the “Benefits of AI for Your Enterprise” section in Chapter 2, you will notice that those use cases also share those characteristics.
Bad use cases
“Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad.”
— Miles Kington
At this point in its development, AI is not a good tool for tasks that involve the human touch, such as situations that require creativity, imagination, empathy, compassion, and the like.
For example, although AI is an excellent tool for legal document capture, archiving, searching, and flagging issues for human review, you wouldn’t want AI to be the judge or jury in your trial. You might be able to appeal to the mercy of the court, but at present it isn’t possible to appeal to the mercy of the algorithm.
Reinforcement learning and model drift
Perhaps the most classic example of the difficulty of coding and training empathy into AI is Microsoft’s Twitter bot, Tay. Launched at 7:14 a.m. on March 23, 2016, Tay was designed to mimic the linguistic patterns of a 19-year-old female. But within 16 hours of her launch, and after 96,000 tweets, Microsoft suspended the Twitter account because Tay had become an alt-right, Holocaust-denying, anti-feminist, anti-Semitic troll.
Although Microsoft hasn’t disclosed the technology behind Tay, it has the earmarks of a subset of machine learning called reinforcement learning. Used to teach computers how to play games or to train a robot how to walk, reinforcement learning defines success criteria and then rewards the algorithm when it gets closer to the desired behavior. The machine must figure out for itself how to achieve success. On social media, success correlates to likes, so Tay learned how to maximize engagement from the people who interacted with her.
Tay is an example of model drift. After the model was initially trained and Tay interacted with users without a human in the loop or any automated testing to correct for bias or other issues, model drift came into play due to the skewed nature of the data it encountered.
Nine months later, Microsoft launched Zo, Tay’s 13-year-old sister, which didn’t use reinforcement learning, so model drift wasn’t an issue. Zo avoided potentially charged topics such as religion and politics, shutting them down with comments such as, “not gonna keep talking if u keep this up … just sayin.” Zo lasted for more than two years before disengaging on most social media platforms in March 2019, and completely in July 2019.
Insufficient or biased data
Then there is the problem of insufficient training data. In 2017, due to reports of possible student visa fraud, the U.K. Home Office used voice-recognition software to flag cases where it appeared that the same voice took an English-proficiency test for multiple students. However, voice-recognition accuracy rates are dependent on having known samples of the voice being reviewed, and the organization doing the review didn’t have independent samples of the English-proficiency test candidates. Based on the results of the review, the government refused, cut short, or canceled the visas of nearly 36,000 people.
In the fall of 2019, the U.S. National Institute of Standards and Technology tested 189 facial recognition algorithms from 99 developers using 18.27 million images of 8.89 million people taken from four sources: domestic mugshots, immigration applications, visa applications, and border crossings.
They tested two common matching tasks for false positives (finding a match where there isn’t one) and false negatives (failing to find a match when there is one):
One-to-one matching: Match a photo to another photo of the same person in a database. Examples: Unlock your smartphone, board an airplane, check a passport.
One-to-many searching: Determine whether a photo has a match in a database. Example: Identify suspects in an investigation.
For one-to-one matching, most systems reported false positives for Asian and African American faces, with algorithms developed in Asia doing better at matching Asian faces. Algorithms developed in the U.S. consistently registered a high rate of false positives for Asian, African American, and Native American faces. For one-to-many matching, African American females had the highest rates of false positives.
Essentially, facial recognition works best for people with the same phenotype, or observable characteristics of an individual based on their genetic makeup, as the people who developed the algorithm. Those outside the bias of the model will experience problems with determining their identity for travel and law enforcement purposes, or with being falsely accused of a crime when law enforcement gets a false positive in one-to-many matching.
False positives
At the June 3, 2017 UEFA Champions League Final in Cardiff, Wales, the South Wales Police used facial recognition to scan the crowd for known criminals. The system flagged 2,470 potential matches with custody pictures, but 92 percent of those matches were false positives.
In 2018, the American Civil Liberties Union (ACLU) used Amazon’s Rekognition face recognition software with default settings to search a publicly available arrest-photo database using the headshot photos of all 535 members of the U.S. Congress. The software incorrectly matched 28 of them with a criminal in the database. In 2019, the ACLU ran the same exercise for the California legislature and got 26 false positives out of 120 legislators.
For now, it seems that you can’t turn over identifying a person to the facial recognition algorithms. Human intervention is still required.
Reducing bias
Because AI systems are designed by humans, it is not a surprise that they would have bias in them. It can start at the very beginning when the problem is framed, in data collection, data preparation, or in all three.
As Microsoft’s experiment with Twitter revealed, an algorithm maximizes success as defined by the designers. Solon Barocas, an assistant professor at Cornell University who specializes in fairness in machine learning, pointed out that an issue arises while framing the problem when “those decisions are made for various business reasons other than fairness or discrimination. If the algorithm discovered that giving out subprime loans was an effective way to maximize profit, it would end up engaging in predatory behavior even if that wasn’t the company’s intention.”
Increasingly, social media has become a textbook case of algorithmically-enforced confirmation bias, or emergent bias. Emergent bias isn’t based on the source data, but rather on how the algorithm interacts with the user. For example, if a user likes, favorites, or subscribes to content with a certain viewpoint, such as articles about vaccinations, vegan diets, politics, or even exotic coffee, the algorithm feeds that user more content with the same viewpoint and excludes content from an opposing viewpoint. For example, coffee lovers get more content about the positive effects of coffee consumption and less content about the negative effects of excessive caffeine intake. Over time, the platform magnifies the echo-chamber effect of self-validation, amplifying the person’s inherent bias.
As of this writing, automated and human-in-the-loop tools to address bias, security, compliance, and transparency are appearing in the marketplace.
When determining a specific use case for AI, consider these guidelines:1 Create a process that guides the development of trustworthy AI aligned with human values.
2 When framing the problem, seek a wide