The emergency responses of the London Fire Brigade (LFB) are complex operations involving a variety of specialised skills.
Not all skills are shared by all firefighters, so unanticipated disruptions to the team composition (for instance due to sickness) frequently render teams unable to respond to emergencies without appropriate substitutes for absent colleagues.
This makes it necessary for the LFB to reallocate a significant number of firefighters across London each and every day on an ad hoc basis. This is a resource-intensive process for the LFB and a major inconvenience for firefighters.
My Fellowship project focused on working together with the Insights team at the LFB, led by Apollo Gerolymbos and Andrew Mobbs, to understand how machine learning could be applied to their rich dataset and help improve this process.
The ability to anticipate disruptive staff shortages enables the LFB to prevent problems or at least be better prepared, and hence my challenge was to establish how well machine learning could predict the unavailability of fire engines due to understaffing.
To get under the skin of the project, I met with many senior analysts, senior emergency operations staff, various members of the human resources department as well as actual firefighters.
We identified a variety of ways in which machine learning could either improve processes or inform stakeholders within the organisation to make better-informed decisions. We decided to focus on predicting disruptive staff absences.
So-called “off-the-run events” (which is when a fire engine is unable to operate due to understaffing) occur in about 11% of all day-shift-station cases. This can be formulated as a supervised learning problem which requires adjustments for imbalanced classes.
The overall procedure I used is as follows. First, the data was randomly split into training, validation and test sets. The splitting was stratified so as to keep the fraction of off-the-run events similar in each split. On the training data, I addressed the class imbalance by randomly oversampling the minority class.
I then fit classification algorithms to the resulting training data. I tested a variety of classifiers, with the main contenders being Random Forest and Gradient Boosted Decision Trees. I used a grid-search approach to tune the estimators’ hyperparameters.
In the end, Gradient Boosted Decision Trees outperformed all other estimators.The recall of the model on the test data is 72%, which highlights the enormous potential of Data Science and Machine Learning to predict even seemingly random or hard-to-predict outcomes.
André Richter took part in the ASI Fellowship May 2017. Prior to the Fellowship he completed a PhD in Economics from Stockholm University.