When assessing their ability to develop a new data science product, many organisations start with data. Do we have enough? Is it high quality? Can we access it? These are all sensible questions and, obviously, having data upfront is useful. But are they the most important questions? And is this really the best approach?
Having worked on over 500 projects, we’ve learnt that data is rarely the limiting factor. You may immediately be suspicious that we’ve only worked in organisations with beautiful data, like cutting-edge start-ups and tech giants. And whilst we have worked with our fair share of such companies (who often have their own data problems), we’ve also worked with many of the world’s biggest and most important organisations, like the NHS, national governments and Fortune 500 companies; organisations certainly not renowned for having large volumes of clean, structured data.
Then how is it – given our experience of the messy state of data in many organisations – that we still claim that data is rarely the limiting factor?
Charlie Munger, the billionaire investor and business partner of Warren Buffett, has a great expression: “Like the great Jacobi says ‘Invert, always Invert’”. What would happen if we inverted how we think about data science, moving the emphasis from ‘DATA science’ to ‘data SCIENCE’?
Putting the ‘science’ back in data science
Science (or the scientific method) is the process by which we build an ever more accurate, causal model of the world through experimentation. And thanks to Judea Pearl, and his academic colleagues, we now understand that you can only extract causation through experimentation. In a business, you experiment by performing an action in the world. So, if you spend your time solely examining static datasets, you can only really access correlations.
A good example of why correlations aren’t helpful is the simultaneous increase of sunscreen sales and shark attacks during the summer; a static dataset might suggest the former causes the latter, an obviously incorrect conclusion. That means the limiting step in your ‘data SCIENCE’ understanding is your ability to conduct experiments – or, take action in the world.
‘DATA science’ starts from the data, thinking about what’s possible given a data set. ‘Data SCIENCE’, on the other hand, starts from experimentation, thinking about what’s possible given a set of actions.
Ultimately, our preference for ‘data SCIENCE’ over ‘DATA science’ boils down to a simple comparison. If you have data but can’t action the output, there’s no value; if you can take an action but have no data, you can start experimenting and capture data, driving improvements in performance.
Despite sounding promising, this is all terrible nomenclature. Capitalising various words to get a point across is bad on a page and even worse verbally. I suggest we refer to ‘data SCIENCE’ as ‘action-oriented data science’ since it captures the shift in emphasis towards action and has a nice forward momentum to it. Conversely, we’ll call ‘DATA science’ ‘passive data science’.
Passive vs. action-oriented data science
To concretely compare these approaches, let’s start by thinking about passive data science. Imagine that by looking at your existing data, I could tell you the perfect offer to make to each of your customers. I could uncover the right product and message combination that would delight them. The caveat was that every single one was completely different, and required you to hand-deliver the packages yourself. When you have tens of thousands of customers, this is impossible to execute and has zero impact on the bottom line.
Now, let’s invert. In a world where we do action-oriented data science, we start with the actions. Immediately, I start by thinking about the operational things that I can actually do. For instance, I recognise that my ability to reach customers is limited. Perhaps I can only email them all with the same mailout? Limited testing here shows that I can’t make much difference, so instead, I concentrate my attention on how to ensure that we’re doing sufficient demand forecasting so products are never out-of-stock. This adds a percentage point to my revenue.
This is, of course, a fictional example – but only just. Many companies have told us stories that follow a similar plot, and I’m sure you’ll have equivalents in your organisation. The thing is, this is the harder example to demonstrate the value of action-oriented data science. The easier one is far more commonly faced: where there is poor or no data available. Here, passive data science can’t even get started. Action-oriented data science, on the other hand, can immediately begin testing and learning, and driving performance improvements.
We can make a comparison between the two approaches more thoroughly with the table below.
In no circumstances does action-oriented data science under-perform passive data science. Most of the time, it exceeds it. Crucially, data with no action has no impact – but available actions with no data are a learning opportunity that can generate impact immediately.
Our decision intelligence software, Frontier, is founded on the principles of action-oriented data science. It builds a causal model of your organisation so you can understand what’s happening in your business and why, enabling you to make truly informed, optimal decisions. Click here to find out more.