We built a recommendation system for a luxury car manufacturer, who wanted to recommend their new SUV to their customers. We built a model to identify the customers who are more likely to be interested in a SUV and would be further influenced by an exclusive launch event invitation. 
Customer

A luxury car manufacturer.


Problem

In summer 2019, a world-leading car company will launch its first model in the luxury SUV market segment. There will be exclusive launch events around the world and the company is planning to limit the number of people invited to attend.

As a result, it is vital that each dealer choose the best candidates to invite based on recommendations drawn from the information stored in its customer relationship management (CRM) system. This database has around 40 different feature fields, none of which are capable of delivering a comprehensive picture of the customer.

Faculty was charged with developing a model that made the best recommendations possible using every category in the customer database.


Solution

We created two different scoring metrics to rank the global contacts database according first to the likelihood that a given contact would be interested in the new SUV and, second, to the likelihood that they would purchase a car after attending a launch event. We also worked with the company to identify additional data sources outside the customer database to incorporate into the scoring metrics.

We used a classification algorithm for this problem, (random forests) as there was both the mixture of continuous (i.e. time and date fields) and categorical (such as previously owned cars) variables in the data, and their robustness at dealing with the problem of overfitting on small data sets with highly imbalanced classes.

For each scoring metric, we divided the data set into separate groups of individuals with known ‘perfect scores’ (who had already expressed interest in the new SUV and who had previously bought after attending launch events, respectively) and groups of individuals who were very likely to have low scores. Two separate random forests were then trained on these groups to identify previously overlooked candidates in the CRM who were highly likely to be interested in the new SUV and to be influenced by an invitation to a launch event.

When tested on previously unseen data, the models we provided for the client obtained the following performance metrics: 40% of contacts identified as good prospects by our models had either already expressed interest in the new SUV or purchased a car after attending a previous launch event, and over 60% of all contacts in these two groups were identified by our models.


Impact

The method we developed has been incorporated into the customer database. The company is sending out recommendations to local dealers around the world of whom they should invite to launch events, with a far higher probability that this will generate sales of the new model.