This is part two of a series on the new principles that are shaping tomorrow’s AI and how companies are adapting their technology for use outside of the lab, in the real world, today. You can read part one here.
In this blog, I’ll discuss the rise in demand for explainable AI, and how my team at Faculty made sense of the landscape and chose an approach that works for us. I’ll go on to discuss some of the limitations and the need for standardised approaches. You can find out more about AI explainability – including the business case for explainability and how to build explainability into your own models – by searching for ‘explainability’ on our website.
AI explainability was one of the buzz phrases of 2019. There has been a sudden proliferation in open source tooling for explaining model outputs, with many companies – new and old – now building model explanations into their products and platforms. This is hardly surprising, considering the broad range of benefits provided by explainability: everything from meeting governance and regulatory requirements, to building trust in model outputs for consumers, and supporting iterative model development. You can read more about use cases for explainability here.
At Faculty, we too found the case for model explainability compelling. We wanted to find an approach that worked for us, and also contribute to a still relatively nascent field of research. It quickly became clear that there were several approaches to explainability out there, and no real consensus on best practices, so our starting point was to try and figure out what we wanted from an explainability algorithm.
A taxonomy for explainability
We have found it helpful to organise our thinking on explainability using the following taxonomy, which highlights some key differences between different approaches and is independent of what it actually means to explain a model.
Specifically we distinguish between:
• Explanations that rely on models being intrinsically interpretable and explanations that can be generated post-hoc for non-interpretable models.
• Model-specific methods and model-agnostic methods.
• Global explanations that explain how a model makes decisions in general, and local explanations that explain how a model made a particular decision.
We’re most interested in approaches that fall on the right hand side of all three axes, for the following reasons: firstly, intrinsically interpretable models are generally intrinsically interpretable because they are simple and don’t have too many parameters – and as such are much less powerful than more complex models. If we limit ourselves to these models, we are simply trading performance for interpretability. We’re much more interested in whether it’s possible to explain models without compromising on performance. As it turns out, in this context we can have our cake and eat it: there are ways to explain complex models in a post-hoc fashion.
Secondly model-agnostic methods allow us to always use the best model available for any given task, without worrying about whether or not the model can be interpreted by our explainability tool. We could expand the range of models available to us by cobbling together several model-specific explainability approaches, but this adds unnecessary complexity; with a single model agnostic approach, our workflow is always the same, and the explanations are much more likely to be comparable.
Finally, local versus global explanations is not really a case of either/or. If you can explain each individual decision, you can aggregate those explanations into an explanation of the model’s overall behaviour. The reverse is not true; the reason for a single decision being made could be completely different to the average behaviour. As a result, we always favour local explanations.
This analysis of what we want from an explainability algorithm is what led us to Shapley values, a decades-old tool from cooperative game theory that has more recently been applied to model explanations and falls on the right hand side of each axis. It has its limitations, and addressing some of those limitations has been the foundation of our AI safety research programme. You can read about some of the improvements we’ve made to Shapley values for model explanations elsewhere on our blog.
The need for consensus
As is normal in such a new field, there are a huge variety of alternative approaches to the one we use at Faculty, each with their own advantages and disadvantages.
The analysis above that led us to Shapley values also doesn’t really address the question of what it means to explain a model. Indeed, the requirements we have settled on have naturally guided us towards explaining a model in terms of its input features, but it’s not obvious that this is the best way to explain a model – even if it is currently the most common.
Agreement on what constitutes an explanation would allow practitioners to meaningfully compare explanations of different models. This is important, as we can’t agree on an explanation of how a model works if we don’t see eye to eye on what an explanation is in the first place. Another benefit to a unified approach is that consumers of explanations don’t need to learn about multiple different methods in order to understand different explanations. This is important too, because an explanation is only useful if you are able to interpret and understand it.
We must be careful if we attempt to standardise the approach; we don’t want to end up in a situation where explanations of how a black box AI model reached its decision are so complex that they themselves are impossible for all but a few to understand.
There is value in trying to overcome such issues. In addition to some of the challenges outlined above, without a common language for understanding how models work, the task of regulating AI is made that much more difficult. So it seems increasingly likely that the AI industry will need to congregate around a common approach.