Attendees and panelists at the Copenhagen Democracy Summit, listened to a surprise video conference call from US president Donald Trump last week – or, at least, so they thought. In fact, the video was a ‘deepfake’ created by Faculty to raise awareness of the threat this technology poses to our democracies as part of its work with the conference hosts, the Alliance of Democracies.
The Copenhagen Democracy Summit is dedicated to strengthening the resolve of the world’s democracies by providing a high-level strategic forum exclusively focused on the cause of democracy. Among the panelists at the plenary session on Democracy and Technology was former UK deputy prime minister Nick Clegg, who is now Head of Global Policy and Communications at Facebook.
‘Deepfakes’, an umbrella term for fake videos produced with deep learning, have the potential to disrupt our democracy in dramatic ways. Recent advances in technology mean that well organised groups with a small amount of funding can produce compelling deepfakes that are hard to distinguish from reality. The 2020 elections in the United States are firmly in the crosshairs. Faculty has been working with the Alliance of Democracies to raise awareness of deepfake technology among governments, the public and the media, and also to explore detection methods that could help prevent the spread of this material.
We believe the most powerful defence against this technology is to raise awareness of what is now possible. The fake video call shown in Copenhagen was created by bringing together two state-of-the-art machine learning models – one model that produces sentences of the user’s choosing in Trump’s voice and a second that can create video of Trump that matches the movements of an actor. Both models employ artificial neural networks.
How we created the AI deepfake of Donald Trump
The first model learned how to speak like Donald Trump by “listening” to many hours of publicly available audio recordings of the president’s speeches, matched with transcripts. From this data the model learned what Trump’s voice sounds like and how he pronounces different words. Once training was complete, a process which took many weeks on a powerful computer, the model was able to “say” any piece of text in Trump’s voice.
An actor was then filmed lip syncing to the audio. The resulting video and real videos of Donald Trump downloaded from the internet were given to a second AI model. The model is able to learn the characteristic way each face moves when speaking – for example how a furrowed brow is articulated or how a mouth looks when open. This allows it to generate a video of Donald Trump’s head and torso that matches the movements of the actor.
The striking, final result is a video of Donald Trump saying the words generated by the first AI model.
Here’s the video, recorded in the room, of ‘Trump’ conversing with session moderator, Jeanne Meserve.