Arrow DownArrow ForwardChevron DownDownload facebookGroup 2 Copy 4Created with Sketch. linkedinCombined-Shape mailGroup 4Created with Sketch. ShapeCreated with Sketch. twitteryoutube

The only thing worse than no explanation, is a wrong explanation. When attempting to explain the predictions of machine learning (ML) models in the real world, it is critical to be precise. One of the main motivations behind explainability techniques is to build trust in ML models. If the explanations are wrong, this trust is broken.

This three-part blog series on ‘How to get explainability right’ will cover whether to use an interpretable model or post-hoc explanations, the dangers of approximations in explainability methods, and the benefits of Shapley values and how to calculate them accurately.

The consequences of wrong explanations can be severe: engineers might ignore an anomaly detection system if the explanations keep sending them to the wrong parts of a machine, where they cannot find a fault. Even worse, incorrect explanations used as evidence of seemingly irrefutable fairness can result in the cementing of unfair bias. For example, a bank could be using a biased credit risk prediction system inadvertently because their explainability tool suggested the system was fair. Explainability is important; but it is equally important to get it right.

A key decision to make early in a project is whether to use an interpretable model or a black-box model together with post-hoc explanations. A model is interpretable if the way it makes decisions can be explained to a human so that the human can exactly reproduce what the model does. An example of this is a shallow decision tree: once the decision tree is trained, the decision rules can be explained to a human such that they can understand and reproduce every decision made by the model. Other examples of interpretable models include linear regression and explainable boosting machines.

Requiring inherent interpretability restricts modelling and can lead to lower accuracy. However, there is not always a trade-off between interpretability and accuracy: for some applications interpretable models are just as good, for others black-box models can achieve higher accuracy.

The alternative to interpretable models is to train a black-box model and use post-hoc explainability techniques. These techniques give access to simplified information such as the importance of each feature input to the model. It is important to stress that ‘simplified’ does not have to imply ‘inaccurate’: feature importance can be defined rigorously and calculated without making inappropriate assumptions (the subject of the next blog post); as we published recently, it is even possible to incorporate causality into post-hoc explanations!

When should interpretable models be chosen? 

1. If an interpretable model is just as good as a black box model
In this case, complex black-box models are not required and an interpretable model with few parameters should be chosen instead. Not only will it probably be easier to train, it also gets all the benefits of complete interpretability for free*. While ‘just as good’ mainly refers to the accuracy of the model, there could be other considerations too such as the inference speed.

2. If complete explanations are more important than model accuracy
Even if for a specific application black-box models would perform better, they should not be chosen if complete transparency is required. This can sometimes be a matter of debate: for example, some people might choose a black-box (together with good post-hoc explanations) to make medical decisions for them if its accuracy is significantly higher than that of an interpretable model or a doctor. However, others might disagree, especially given the recent revelations about bias in medical algorithms.

Many current real world applications of ML Faculty has developed do not fall into either of the two categories above. For many of our projects, black-box models together with rigorous, accurate post-hoc explanations are the right choice. For example, an anomaly detection system for a complicated machine should be as accurate as possible and likely has to be a black-box to achieve this. However, if we can use feature importance to highlight what exactly is wrong with the machine, engineers will be able to address the problem much faster and will trust the output of the model much more.

If post-hoc explanations are the right approach for your application, the next step is to use the appropriate technique and to make sure the technique does not make any inadequate approximations. This will be the subject of our next blog post: stay tuned!

If you want to find out more about our AI safety R&D programme, you can visit our research page.


* There are some cases in which an interpretable model can be made more accurate by lengthy feature engineering. These cases require a commercial decision on whether the need for complete explanations is high enough to outweigh the costs associated with the feature engineering process.

To find out more about what Faculty can do for you and your organisation, get in touch.