A Case For Explainable AI & Machine Learning

By Katarina Athens-Miller, Anna Olecka & Jason Otte

In the recent KDnuggets article Holy Grail of AI for Enterprise — Explainable AI **[1], **Saurabh Kaushik, a AI Product Engineering Lead at Genpact wrote:

Trustworthy Production Deployment of an AI System. Yes, it is Holy Grail of AI and for the right reason; whether it about losing a High-Value customer due to wrong Churn Prediction or losing dollars due to incorrect classification of a financial transaction. In reality, Customers are the less bothered accuracy of AI model, but their concerns are about Cluelessness of Data Scientist to explain “How do I trust its decision making?”

Data Scientists building credit risk models in the consumer space faced the transparency requirement probably for as long as this field existed, due to regulatory compliance which governs the consumer risk. Marketers also have been bound by certain rules which disallow protected categories such as gender or race to enter the models. These regulations were created in US to protect consumers. They go even further in Europe. But the need for transparency goes far beyond the consumer protection.

As AI models get deployed in the Industrial IoT space, such as Predictive Maintenance, the operators who use the models’ outcomes request to receive reasons for alerts. If they are supposed to react to those alerts, and in some situations react fast, they need to know where specifically to focus their attention.

For a long time there was an unspoken agreement within the AI community that the most sophisticated models, such as image recognition should be exempt from the explainability requirements. But as these models are being deployed in wide social settings such as building security or the criminal justice system, it is becoming clear that they often are built on biased data and therefore will produce biased results. For example several studies shown that if a significantly smaller data sample of dark skin individuals is used to train the face recognition algorithms, those algorithms are less accurate on individuals with darker skin. According to Joy Buolamwini from M.I.T. Media Lab [2] & [3], as much as 35% of dark skin women can be misidentified by AI. That error for white males is 1%.

In October 2016 Center on Privacy & Technology at Georgetown Law published an extensive study The Perpetual Line-Up - Unregulated Police Face Recognition In America [4]. Among may findings the researchers found a systemic bias of the face recognition software used in law enforcement.

Police face recognition will disproportionately affect African Americans. Many police departments do not realize that. In a Frequently Asked Questions document, the Seattle Police Department says that its face recognition system “does not see race.” Yet an FBI co-authored study suggests that face recognition may be less accurate on black people. Also, due to disproportionately high arrest rates, systems that rely on mug shot databases likely include a disproportionate number of African Americans. Despite these findings, there is no independent testing regime for racially biased error rates. In interviews, two major face recognition companies admitted that they did not run these tests internally, either.

Source

A recent incident, where Amazon has built an AI tool to automate hiring decisions only to discover few years later that the model was producing biased results because the underlying data was skewed to under-represent women, was widely publicized in the media. Reuters [5] commented “The company’s experiment, which Reuters is first to report, offers a case study in the limitations of machine learning. It also serves as a lesson to the growing list of large companies including Hilton Worldwide Holdings Inc. (HLT.N) and Goldman Sachs Group Inc. (GS.N) that are looking to automate portions of the hiring process”.

There are still situations where the transparency is not operationally or regulatory critical. Recommendation engines recipients for example do not have to know how Netflix chose the next movie to recommend. But even in those situations, a question “why” is common on consumers’ minds and satisfying that curiosity will go a long way in building user’s confidence in the recommendation.

Information on AI bias spreads almost as fast as AI applications themselves. Recent examples include major respected venues such as Ted Talk [3] as well as major publications, such as Harvard Business Review [6], Quartz [7], or the NY Times [8]. Having recognized the issue early on, the ACT (Association for Computational Linguistics) has been organizing a conference on Fairness, Accountability, and Transparency in Machine Learning since 2014 and the invited speakers include top ML researchers and practitioners[9] (https://fatconference.org/2019/cfp.html)

Still, some in the AI community doubt the need for explainable AI, and prefer to focus on furthering the cause of model accuracy.

As AI and ML models keep gaining more and more popularity and are applied to ever wider use cases, it’s important to keep in mind the risks. Let us not fall into the trap where accuracy trumps explainability. If we, the Data Scientists fall into that trap, we will lose the confidence of the public and those who are to deploy our models.

In support of the explainable AI cause, DataWisen and our collaborators present some prominent use cases. They are just a small sample of the need out there for the interpretable ML/AI. Readers are encouraged to share their own examples in the comments.

We divide this set of use cases in to three sections:

Operational needs
Regulatory compliance and
Public trust and social acceptance.

A. Operational needs

Use Case #1: Oil Refinery Assets Reliability: Furnace Flooding Predictions

The SituationMaintaining stable combustion is critical to the safe operation of a furnace in an oil refinery. If the combustion process becomes unstable and this condition is not recognized and acted upon, furnace may flood and eventually an explosion may occur. Flooding occurs 12 to 20 times per year in oil refinery furnaces. In each case, the furnace needs to shut down to avoid explosion. Cost in disrupt production is estimated at millions of dollars annually
The ML/AI SolutionA reliable on-line prediction of furnace flooding to enable timely operators’ response. Data for this ML project will include all sensors’ signals from the furnace as well as external data such as weather, humidity etc. The solution should predict likelihood of a furnace flooding 20 minutes prior to event. Operators requested that the alerts include the most likely reason(s) for the alert (i.e. which sensors point to the risk of flooding)
Why Explainable AIThere will be different reasons for the flooding risk increase: a burner could be snuffed, plenum damper may become faulty, etc. Operator needs to react quickly to the alert issued by the ML model. Knowing which sensors are acting out of order, enables the operator to take the preventive steps without wasting time on additional investigations.

USE CASE #2: UTILITY REVENUE PROTECTION: ENERGY THEFT DETECTION IN SMART GRID

The SituationCommercial losses from energy theft are crippling utilities around the world, totaling estimated $89B annually and driving up energy prices for customers. Energy thieves use multiple means of stealing energy: they can tap into a line between the transformer and a house, they can tap into a neighbor’s meter, reverse their own meter, etc.
The ML/AI SolutionTo minimize theft, the Revenue Protection Manager needs a prioritized list of most likely theft cases that need investigating. Such list can be generated by a ML model trained on smart meters’ data and external factors such as weather, geographic risk in the area etc. The tool must be flexible enough to adjust to evolving theft methods. The tool needs to be able to identify the factors elevating the risk of theft as well as output the geographic location of the offense.
Why Explainable AIDifferent types of theft require different action by the investigators. A reversed meter needs to be disconnected, a household where a thief tapped into stealing neighbor’s energy needs to be alerted and the altered meter replaced. In case of the transformer line tapping a truck needs to be sent to the proper location etc.

USE CASE #3: POWER UTILITY – LINE MAINTENANCE

The SituationMaintaining power infrastructure is critical to reliability of the power delivery systems. If power lines or transformers age or deteriorate due to weather conditions, and this condition is not recognized and acted upon, equipment failure may cause power disruption and outages.
The ML/AI SolutionTraditional solution relies on scheduled maintenance, but with the growth of the sensors built into the equipment, there is a growing trend to maintain equipment in a “just in time” manner when the line or transformer sensors signal the upcoming maintenance need. A reliable on-line prediction potential equipment failure include all sensors’ signals from combined with external data such as weather, humidity etc. The solution should predict likelihood of an equipment failure in real time.
Why Explainable AIPower delivery infrastructure spans vast geographic area. Knowing which sensors are signaling, enables the maintenance crews to be dispatched to the right spot.

B. REGULATORY COMPLIANCE AND LEGAL IMPLICATION

USE CASE #4: CREDIT RISK SCORES

The SituationAll financial institutions along with many other businesses make consumer credit decisions based on consumer level risk score. A credit scoring model is a tool that is typically used in the decision-making process of accepting or rejecting a loan.
The ML/AI SolutionML consumer level models which create the risk scores, are based on information about the borrower’s financial behavior, past payment history, credit utilization etc. allows credit providers and others to make rational decisions based on the estimated risk of future default.
Why Explainable AIThe Fair Credit Reporting Act (FCRA) is a federal law that regulates credit reporting agencies and compels them to insure the information they gather and distribute is a fair and accurate summary of a consumer’s credit history. … The law is intended to protect consumers from misinformation being used against them. If a consumer is denied credit, insurance or employment because of the credit report, he/she has a legal right to ask for the specific reason for the denial. Subsequently any risk model used in credit decisions needs to produce key reasons for the score.Furthermore, all risk calculating models in the in banking sector in US today are being scrutinize internally by the bank’s compliance team and externally by the regulators. These bodies require full transparency of the data being fed into the model and impact of the individual features, to prevent potential discrimination against protected classes.

USE CASE #5: VIDEO TREAT DETECTION

The SituationFacilities need security to protect human and physical assets. A typical solution has been a combination of security personnel and use of video feeds. Personnel cannot monitor every entry point and every feed at all times, and due to error may miss some threats.
The ML/AI SolutionAI and Deep Learning Models evaluate the video feeds to detect threats which are then flagged for the security personnel. For example, AI image recognition models are trained to flag high risk individuals approaching an entrance (such as someone likely to carry a weapon or a known criminal), while ignoring a legitimate employee.
Why Explainable AIFlagging an individual as a threat has a potential for significant legal implications. If the data used by AI has been trained on a sample containing disproportionate cases in some categories (say, certain minorities), a bias toward those categories is introduced.An individual humiliated by being intercepted, blocked or searched by security may challenge the legality of the incident. The company may be forced to justify its action in court, and disclose reasons for singling out that individual. Without the ability to explain the reason why the model flagged the individual, there would be no way to protect the company discrimination case.

USE CASE #6: PRICING MODEL FOR THE EQUINE SPORT INDUSTRY

The SituationIn the equine sport industry, prices and valuation of horses lack standards and transparency. The industry also suffers from the wide spread practice of undisclosed profit-taking by agents facilitating horse sales and purchases.The most important factors that should drive the value and price of a horse is its pedigree, physical conformation and ability for a particular discipline (jumping, dressage, eventing etc.), level of education/training and performance records, age and health. Horse’s visual features also impact its value.Understanding the data which drives the value of a sport horse is essential for fair and equitable pricing of horses. This is critical to make this industry affordable and viable as a sport. Modeling and transparency around valuation and pricing would help prevent kickbacks, excessive commission and price fixing.
The ML/AI SolutionMachine learning models can crunch millions of historic data on past sales transactions, to build a reliable pricing model. AI visual recognition models can recognize horse’s features and enable the pricing model to be applied. Together these tools can provide an invaluable and objective tool for pricing sport horses.
Why Explainable AIThe price indicated by the AI models by itself reveals very little of the features and ability of the horse. A potential buyer must see the underlying factors and horse’s features to make the right decision for his/her needs. A model that explains the factors driving the price, is necessary to satisfy the market needs and instill users’ confidence in the price.

REFERENCES

[1] Holy Grail of AI for Enterprise — Explainable AI, Saurabh Kaushik; KDnuggets, October 2018; https://www.kdnuggets.com/2018/10/enterprise-explainable-ai.html[2] Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification; Joy Buolamwini, MIT Media Lab & Timnit Gebru, Microsoft Research; Proceedings of Machine Learning Research Conference on Fairness, Accountability, and Transparency, January 2018[3] How I am Fighting Bias in Algorithms; Joy Buolamwini Ted Talk, November 2016 https://www.ted.com/talks/joy_buolamwini_how_i_m_fighting_bias_in_algorithms?utm_campaign=tedspread&utm_medium=referral&utm_source=tedcomshare[4] The Perpetual Line-Up - Unregulated Police Face Recognition In America; Center on Privacy & Technology at Georgetown Law; October 2016[5] Amazon scraps secret AI recruiting tool that showed bias against women, Jeffrey Dastin; Reuters, Oct 2018; https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G[6] Algorithms can’t Fix Societal Problems […],Dave Gershgorn; Quartz, October, 2018; https://qz.com/1427159/algorithms-cant-fix-societal-problems-and-often-amplify-them/[7] Auditing Algorithms for Bias; Rumman Chowdhury & Narendra Mulani, Harvard Business Review, October 2018; https://hbr.org/2018/10/auditing-algorithms-for-bias[8] Facial Recognition Is Accurate, if you are a White Guy; Steve Lohr, NY Times, Feb 2018 https://nyti.ms/2BNurVq[9] This past April, the Council of Europe adopted European Ethical Charter on the use of artificial intelligence in judicial systems https://www.coe.int/en/web/human-rights-rule-of-law/-/council-of-europe-adopts-first-european-ethical-charter-on-the-use-of-artificial-intelligence-in-judicial-systems

Resources:

Related: