Blog
New principles for AI: Generalising AI models
In the last few years, AI has undeniably moved out of the lab and become an integral part of our infrastructure, business operations, and everyday lives.
As always when a technology moves from the theoretical sphere into the real world, some adjustments have been necessary. Unexpected challenges have arisen, ripple effects have been spotted and (of course) human error has reared its head.
The result is a new focus for AI researchers as they begin to ask: what does real, applied AI look like? Which obstacles must it overcome? What works outside of the lab, and what falls apart as soon as it puts a toe over the lab threshold?
At Faculty, we’ve worked on hundreds of data science projects for our clients. So we’ve learnt a lot, sometimes the hard way, about what does and doesn’t work in the real world. In this blog series, I’ll be reflecting on three of the trends we’ve observed, and how the use of AI in the real world might evolve over the next few years – beginning with the need for generalisable AI models.
What happens when we apply models to real-world data?
Adoption of AI has grown rapidly, and along with it the number of models that are being put into production. As practitioners will be acutely aware, model performance can degrade over time and seriously interfere with the performance of otherwise effective models.
A common reason for this is known as “distributional shift”, which is when the up-to-date data we’re feeding into a model slowly starts to look less and less like the data it was trained on over time. For example, let’s say a clothing retailer creates a model to predict which clothes a customer might be interested in purchasing – but all the data used to train the model was collected in the summer. As the year progresses and the weather gets colder, the customer might be less interested in t-shirts and shorts, and more interested in warm jumpers. The model makes predictions like nothing has changed, and those predictions start to become less accurate.
Increasingly, we need to ensure that our models will generalise and understand how they will generalise once they are deployed on live data. Ultimately, the best (but also the hardest) way to do this is to focus on causality.
The importance of causality
People often marvel at the almost magical capabilities of machine learning, but really a lot of the time it comes down to complicated pattern recognition. In particular, the proliferation of machine learning libraries and frameworks has made it very easy to quickly train powerful models on a wide variety of data. The algorithms make these ‘magical’ predictions by learning sophisticated correlations between features and the target variable.
Although it has become a cliché, it remains true that “correlation is not causation”. Just because the model has learnt to exploit certain correlations to achieve good predictive performance, it does not mean that the model understands the underlying cause.
Consequently, when the model is deployed in the real world and the underlying data it’s working from starts to shift, it can start to break down. Imagine that an e-commerce company is building an algorithm to predict when it should do a price promotion. The company has a huge amount of customer data, but suppose a large proportion of its sales and website visits take place on Black Friday, when the company is, of course, running huge discounts. If your model is merely looking at correlation and doesn’t understand the underlying reasons that customers purchase , the algorithm might easily conclude that any Friday is a good day to run a price promotion.
If, however, the model could incorporate causality, then it would know that there is a particular phenomenon on the last Friday of November when customers expect prices to be reduced. It would see that, conversely, it might make no sense to cut prices on another Friday, because customers will be expecting to pay full price and the company will be forgoing revenue needlessly.
An attractive trade-off
There are costs to incorporating causality into models. Proving causal relationships between variables is hard, and the best known methods are generally computationally intensive. As a result, we often have to use simpler models so that these expensive computations don’t get too out of hand. Nevertheless, the benefits in terms of robustness – and increased understanding of the underlying causes of quantities of interest – can make this an attractive trade-off.
As long as AI models are kept in the lab, there is little incentive to do the hard and expensive work of building causality into them. But as AI is incorporated into production models, ensuring the model is generalisable will become increasingly important, and the robustness afforded to us by a better understanding of the underlying causes will become increasingly attractive.