The world’s third-largest retailer, operating in 12 countries and employing close to 500,000 people. 


The retailer’s inventory comprises more than 800,000 products. Data is held electronically, using a structured tree, five layers deep with over 4,000 unique nodes. The laborious process of adding new products into the inventory requires a text-only description of the product (including brand and ingredient lists) to be input manually. The user must then decide into which category to place the item, often facing more than 10 possible choices at each layer. With five layers to work through, this complex, time-consuming process is open to error, causing a major bottleneck.

Faculty was asked to develop a model that could significantly improve the categorisation of the client’s products.


We classified products into three major categories (clothing and footwear, fresh groceries and dry groceries), which between them account for nearly two-thirds of all products sold by the retailer.

First, we automatically collated and presented in one paragraph the available text for a particular product. This can be converted into a word frequency vector, with every element in the vector corresponding to the number of times the word appeared in the paragraph. All vectors were combined to form a matrix and build a Support Vector Machine classifier.

The inventory has five layers with up to ten branches at each level.

Results for items in the clothing and footwear category were strong, with the algorithm achieving 98% accuracy as it matched items all the way to the last layer of classification. For fresh and dry groceries, the first two layers achieved above 90% accuracy, dropping to between 70% and 80% in the final layer. Further analysis, which we passed on to the client, revealed that this was in part owing to ambiguous and overlapping categories at the last level.


For clothing and footwear, the 98% accuracy is an improvement on the 95–98% accuracy achieved with manual inputting. Most importantly, the automation saves significant time, and therefore money, in the categorisation process. For fresh and dry groceries, the algorithm was developed to offer further support to the user, ranking and presenting the matches in order.

In over 70% of cases the user need click only one button to confirm the selection made by the algorithm.