Disentangled representations applied to audio data

Classical machine learning often cannot be applied to modern, complex datasets – like those found in voice recording files – without extensive feature engineering. Traditionally, feature engineering requires extensive domain knowledge in order to extract the key components of the data. 

The advent of deep learning means that we can now train models to extract the key components or representations of these data points – separating the speaker from the speech. This in return provides data scientists with smaller, more easily manageable datasets. But, given that the representations that we learn will significantly impact the performance of the downstream model, how can we ensure that these representations are fit for the task?

In this webinar, Faculty Data Scientists Scott Stevenson, Laurence Cowton and Kim Moore will discuss representation learning and its application to the complex task of speaker classification, covering: 

  • An introduction to representation learning and Variational Autoencoders
  • An introduction to the EquiVAE architecture that uses labelled data to create representations well suited to classification tasks.
  • An overview of deep learning methods applied to audio modelling.
  • An example application of the EquiVAE architecture to a speaker classification problem.

date & time

Thursday 3 September, 2020
11.00 – 12.00 BST



This event has finished