Lesson 01
Home Office
The UK Home Office wanted to take down online propaganda that was inciting terrorist attacks around the world, but the tech giants said it couldn’t be done. In proving them wrong, Faculty learned how to develop AI to meet the most demanding operational requirements.
Tucked away behind a nondescript door,
at the end of a long corridor in the UN building in New York, is a suite of rooms rarely glimpsed by the outside world. This is the UK Room, a perk afforded only to permanent members of the Security Council, where British diplomats can withdraw for discreet conversations and private negotiations, or top themselves up from the always-on coffee machine. Here, on a muggy September day, UK Prime Minister Theresa May huddled with her aides and went over her speech one last time. Dressed in a crisp navy blue jacket and white blouse, and wearing a chunky silver chain necklace, she was about to go onstage at the General Assembly.
She was going to throw down the gauntlet.
It was 2018, and the world was grappling with a wave of terror attacks inspired by the terror group Daesh (also known as ISIL, Islamic State or ISIS). In the previous year, 18 attacks had been launched against civilian targets in Western countries; over a hundred people were murdered and many more injured.
In each case, an important inflammatory role was played by the slick online propaganda that Daesh was flooding onto the internet. From glossy jihadist videos to practical bomb-building tutorials, Daesh was able to radicalise its recruits, school them in violence, and ultimately move them to commit terrible acts.
After four attacks in the UK, Theresa May had had enough. In front of the eyes of the world, she demanded that tech companies make a paradigm shift in their ability to stop terrorist propaganda. If they could not identify it and take it down within two hours of it being posted - the crucial window of opportunity - then her government would legislate to force their hand.
The tech companies said it wasn’t possible. The sheer volume of content Daesh pumped out would overwhelm any human moderators, while automated solutions were out of the question. The social media giants had the best AI and software engineering teams in the world, the largest-scale digital operations ever built, and they could confidently say that the technology May was demanding didn’t and couldn’t exist yet.
But the British Prime Minister knew otherwise.
The terror of Daesh
The people within the Office for Security and Counter-Terrorism (OSCT) are one of the smartest and most impressive groups in the British government. Few among the general public know the team exists, but they’re lucky it does. Located in a highly secure area of the Home Office headquarters in Westminster, the staff who work there are deeply expert in analysing and understanding terrorist threats. A typical example is Tom Drew OBE, who worked there for seven years (and later joined Faculty). A softly-spoken thirty-something, with a dark brown beard and a penetrating gaze, Tom has dedicated his career to keeping the public safe.
In 2017, he and his team were troubled by a new emerging threat. In the Middle East, Daesh’s self-styled ‘Caliphate’ was in retreat: it had suffered significant territorial losses and been driven out of its de facto capitals in Mosul and Raqqah. In response, the group changed its tactics. Instead of encouraging supporters in the West to come to Syria and Iraq, they urged them to stay at home and carry out ‘single-actor’ attacks against unprotected civilian targets. Acting alone, outside of existing networks and often with no previous history of extremist activity, they would be almost impossible to stop.
The result was dramatic. An attack on the Houses of Parliament ended with five people killed; a bomber at an Arianna Grande performance in Manchester killed 22 concertgoers; a mass stabbing in London killed eight more.
The resulting coroner inquests showed that almost all the attackers were radicalised by the propaganda they encountered online. Daesh was using the West’s own social media networks against it, recruiting its killers in plain sight on places like Facebook, Twitter and YouTube. The platforms had teams of human moderators trying to find and remove the content, but they were far too few and way too slow to have any meaningful impact.
And when the government asked the social media companies to do more, they were stonewalled. The web was too big. There were too many videos. The technology to automate content moderation didn’t exist yet. ‘They were quite passive,’ recalls Matt Collins, who is now the Deputy National Security Advisor for Intelligence, Defence and Security. At the time, he was Director of Prevent, with overall responsibility for the Home Office team looking at the problem. ‘They were still working to manual checks, and we had to prove to them that machine learning could help.’
Even if you tried it, the tech companies argued, no system would be 100% accurate, and even a tiny failure rate would create enormous problems. ‘We review over one hundred million pieces of content every month,’ said Mark Zuckerberg in February 2017, ‘and even if our reviewers get 99% of the calls right, that's still millions of errors over time.’ Bottom line: counter terrorism experts might be able to identify individual pieces of content with a high degree of certainty, but to apply that kind of rigorous analysis at scale in a real-time operational setting was flat out impossible.
Tom Drew wasn’t buying it. As Head of Data and Innovation, he had a responsibility to do everything in his power to stop the attacks by cutting off the torrent of Daesh propaganda. He suspected that advances in machine learning might have changed the equation. If he could prove that an AI model could replicate the Home Office team’s expertise and apply it on the necessary scale, the government would have the ammunition it needed to force the tech giants to adopt new standards.
The social media platforms had made it very clear that anyone who thought there was a technological solution didn’t understand how technology works. So Tom went to find people who did. He started asking around, talking to a range of experts to see if there was a way to do what he needed. One of the places he came to was Faculty.
Key distinctions between analytics & operations
At the time, Faculty was a tiny startup operating out of an Edwardian townhouse in Marylebone. ‘Even the building sort of made a statement,’ recalls Angie Ma, one of the co-founders. ‘It didn’t look like a classic tech office. We wanted to make the point that we were a different sort of company, that tech didn’t have to be in your face.’ Still using its original name, Advanced Skills Initiative (ASI), the company had barely 20 employees. But Tom felt they had potential.
‘Their approach was (and is) to be collegiate problem-solvers,’ he recalls. ‘They treated it as an experiment to develop a new model, not an opportunity to just resell an existing process or product.’ Matt Collins concurs. ‘There was a can-do attitude when we presented the problem, an enthusiasm to just get stuck in and really see what the art of the possible was.’
Of course, the nature of an experiment is you don’t know ahead of time if it’ll succeed. John Gibson, who at the time led ASI’s consulting business, remembers fielding the first call from Tom. ‘We thought it could be done,’ says John, looking back, ‘but the only way we could prove that it would work was to actually build the technology.’
The heart of the challenge was one that comes up surprisingly often in the AI world: the distinction between analytics and operations. In many organisations, Data Science and AI teams have typically been siloed off in Analytics or Business Intelligence functions, well away from the messy business of the shop floor. Analytical teams provide insight for understanding the world; operational teams act on it and make things happen.
Testimonials
“There was a can-do attitude when we presented the problem, an enthusiasm to just get stuck in and really see what the art of the possible was.”
The danger, of course, is that a gap develops so that insights never actually get put into practice. And in this case, that gap was a chasm. The analysts were at the OSCT, while the people who could implement their recommendations were at the tech companies. Not only were they not on the same page, they didn’t even believe what the other was saying was possible.
Enter AI. The power of the technology is that it can take insights and weave them into actual workflows - but it’s not a quick fix. When you’re building technology for operational processes, there are always requirements you need to account for. They might be user needs, existing workflows, infrastructure requirements, policies or regulations - all the existing rules and constraints of a workplace. Even if the technology works flawlessly, it will never be implemented if it can’t deal with these real-life issues.
In this case, there were three key operational requirements for the solution: it needed to be 1) accurate, 2) quick, and 3) discerning. And these requirements were exacting: not just reasonably accurate or fairly quick, but orders of magnitude more accurate than the 99% that Mark Zuckerberg had dismissed, and fast enough to apply that accuracy at web speed.
The tech companies may have been overstating the case when they said a technical solution was impossible, but they weren’t wrong about the scale of the challenge.
Operational requirements
The first requirement was for speed. Research produced by the UK Home Office and the European Union indicated that the first two hours after the release of new propaganda was the crucial window for disruption. In that time, videos were amassing millions of views, and more than 90% of all the links to that content that would ever exist had already been created. If you couldn’t find it straight away, you were already too late.
For the second requirement - accuracy - the key metric was ‘false positives’. If the software labelled a video as propaganda, and in fact it was benign, the content creator would probably challenge the call. That meant unhappy customers, users denied videos they might want to see, and most likely the need for review by a human moderator. Beyond a certain threshold, too many wrongly-flagged videos would drown the moderators and anger the platforms’ users - exactly the problem that Zuckerberg was pointing at too.
And that threshold for false-positives was low. Exceptionally low. Engineers at YouTube told the Home Office that they would only consider the technology feasible if the false positive rate fell to 0.005%. To put that in context, it meant that for every 100,000 videos the software analysed, it couldn’t flag more than five incorrectly. This became the gold standard that the Faculty team worked towards.
The third requirement was more subtle, but no less challenging. The software had to be sensitive to very fine nuances in the content it was examining. The Home Office team had a profound understanding of every aspect of the terrorists’ content, and the machine-learning model had to replicate that. But it also needed to recognise what Tom Drew and his colleagues knew about content that wasn’t terrorist messaging, but might be mistaken for it.
Some types of entirely legitimate content resembled terrorist propaganda in specific ways that might trip up an algorithm. Worse, the most difficult to classify was also the content that Tom’s team least wanted to remove. Islamic prayer videos and news reports of events in the Middle East, for example, might have certain similarities to jihadist content: censoring them would be not only controversial, but also counterproductive, because much of it actually served to highlight the flaws and hypocrisies in Daesh’s messaging.
So the system Faculty were being asked to build had to be able to discriminate between superficially similar ‘good’ and ‘bad’ content; with a false positive rate of less than five in 100,000; and all within two hours of the content being posted.
And the team could never forget that the clock was ticking. In October, a man drove a pickup truck into a group of pedestrians on a bike path in Lower Manhattan. In Marseille, a man stabbed two women at the train station. Every month brought more grim reminders of the stakes involved.
So the system Faculty were being asked to build had to be able to discriminate between superficially similar ‘good’ and ‘bad’ content; with a false positive rate of less than five in 100,000; and all within two hours of the content being posted.
Extracting multi-modal signals from video files
John’s team at Faculty had a hunch that the reason the social media companies had failed to crack the problem was because they weren’t looking at the content broadly enough. Rather than examining any one single aspect of the videos, Faculty wanted to build an ‘ensemble classifier’, a model that would incorporate not only the relationships between particular attributes, but the relationships between the relationships. To do that, they’d have to wring every scrap of data they could out of the jihadist content.
But only from the content itself. The social media companies had a trove of data on individual users and who they were connected to, their networks and how they behaved online: all invaluable information that would help establish whether the content was terrorist-related. But the companies wouldn’t share that data, and even if they would, Tom didn’t want Faculty’s algorithm using it. ‘We wanted a solution that made a classification purely on the content itself,’ he says, ‘to prove that this could be done with the bare minimum data a government or third-party could capture - the media files themselves.’ It also avoided any issues with accessing users’ personal data, which would have tripped all sorts of ethical and regulatory safeguards. So all that Faculty had to work with was what any YouTube or Facebook user anywhere in the world could access: the videos themselves.
To train a model, you need a lot of data. The Faculty team worked with Tom and his Home Office colleagues, alongside leading academics and security analysts, to trawl the darker corners of the web to scrape up the target videos. As John recalls, ‘By the end of the process, we had copies of pretty much every known piece of content that Daesh had produced.’
The details of what exactly Faculty did with that content are, for obvious reasons, secret. But in broad terms, they extracted everything they could from the data contained within the video. Were there certain types of song that were likely to feature on the soundtrack? Certain types of imagery or iconography, even specific people who could be identified? When the Home Office team were analysing content, they didn’t overlook a single detail. So neither could the model.
Most excitingly, the Faculty team managed to extract the spoken word audio from the videos, transcribe it, and run natural language classifiers on the resulting signal. The rapid developments in generative AI and natural language processing seen in the last two years have now made this task much easier, but in 2017 this was a game changer. As Tom explains, ‘In any detection effort like this, you end up in what we call a “recursive adversarial dynamic”: in other words, chasing a moving target. You detect a type of content, they figure out what you’re detecting, they change it, and so you have to update your approach because it doesn’t work any more.
‘But if your model bases its classification on what the terrorists are actually saying in their propaganda - not just the words but the underlying tenets of their ideology and call to action - then it’s extremely hard for them to escape that without changing what they’re saying. And if you’re forcing them to change what they’re saying, it turns the game of cat and mouse into a strategic victory.’
The results were conclusive. The technology was able to detect 94% of the propaganda, with only a 0.005% false-positive rate. It could tell the difference between terrorist messaging and legitimate prayer videos or news coverage. And it could do it all in almost real-time, processing each video in the time it took to play. This meant that at the scale of YouTube’s then five million uploads a day, only 250 would be incorrectly flagged - enough for a single human moderator to check. It was entirely operationally viable. And they could prove it.
Running the tests
After nine months, the model was becoming more sophisticated. John’s team were optimistic that it could meet the exacting operational requirements they’d been set - but they needed to prove it. In particular, they had to demonstrate that the all-important false-positive rate was below the 0.005% threshold. To do that, they calculated, they needed a sample set of hundreds of thousands of randomly-chosen videos from across the global internet. So for several weeks they downloaded the first few thousand videos that were uploaded to the internet every hour of every day, avoiding any bias that might emerge from times of day or days of the week. Ed Sheeran, ping pong trick shots, cat videos, jihadist content… it all got scooped up and run through the model. Even the validation approach itself got tested and validated. The Chief Scientific Adviser to the UK Home Office vetted the process and pronounced himself satisfied.
The results were conclusive. The technology was able to detect 94% of the propaganda, with only a 0.005% false-positive rate. It could tell the difference between terrorist messaging and legitimate prayer videos or news coverage. And it could do it all in almost real-time, processing each video in the time it took to play. This meant that at the scale of YouTube’s then five million uploads a day, only 250 would be incorrectly flagged - enough for a single human moderator to check. It was entirely operationally viable. And they could prove it.
The technology was able to detect 94% of the propaganda, with only a 0.005% false-positive rate.
A new standard for technology platforms
Thanks to OSCT’s refusal to take no for an answer, Theresa May went onstage at the UN General Assembly armed with the knowledge that what she was demanding was possible. After she laid down her ultimatum to the tech companies, the world took notice.
Some of the attention was welcome. The Faculty offices hosted newspaper journalists and TV news crews in droves over the next few weeks, and the coverage they generated only heaped more pressure on the social networks to take the issue seriously. Within weeks, Mark Zuckerberg was publicly stating that companies like Facebook should be subject to more regulation, not less.
Some of the attention was less desirable, though flattering, in a way. Daesh had been following the news coverage too, no doubt trying to figure out why their content wasn’t hitting its audience as well as before. Security insiders revealed that Daesh had nicknamed Faculty the ‘Dogs of Deletion’ in their internal conversations about the technology. The company took it as a compliment.
‘I went to the west coast nine times in two and a half years,’ says Matt. ‘Each time, we had to put evidence on the table to further the conversation. And when we pitched them to say, “Well, you know you’ve been saying you can’t do this, but we think you can, and here’s what we’ve done,” immediately the conversation went from a policy conversation, which was a bit binary, into a technical conversation, where they wanted to look underneath the bonnet and really understand what we’d done and how we’d done it.’
In the year following the release of Faculty’s classifier, YouTube reported it was using AI to remove more than 80% of violent extremist content before it was flagged by users. Twitter was able to block 96% of terrorist accounts before they could even send their first tweet, and Facebook had implemented AI to take down 99% of terrorist content within 24 hours of its first release, much of it within the crucial first two hours. The Home Office also made the service free to a host of smaller social media platforms who lacked the resources of the bigger players. As Theresa May put it, it was ‘a major step forward in reclaiming the internet from those who would use it to do harm.’
For Faculty, the small startup was suddenly on the map. ‘Serious people in government were telling their colleagues, “This is a company you should be speaking to,”’ recalls Angie. But the real significance of the project was felt in the wider world: for anyone who wanted to go to a concert, enjoy a drink outside, or just walk down the street without fear of being attacked with a knife or rammed with a car.
‘It seems a long time ago now,’ says Matt. ‘But terrorism was the number one national security threat in that moment. We know the importance of social media in our lives, and some of the harm, unfortunately, that it can help facilitate. So demonstrating exactly what capabilities could be brought to bear to reduce the dissemination of that content definitely had a part to play.’
‘A lot of people in different countries were looking at the problem, trying to get the tech companies to tackle it,’ says Tom. ‘And we subsequently heard independently from international partners that this was the big thing that really shifted the needle.’
The lesson in summary
AI is an operational discipline, not an analytical one.
- Many data science teams have their roots in analytics or business intelligence teams. In these circumstances, a culture change is often needed to shift from an analytical to an operational mindset. AI is not being used to its potential where the output is a series of charts and dashboards that describe the world without acting upon it.
- The objective of analytics is to understand the world. The great power of AI is that it can go a step further and operationalise that understanding, by turning it into action.
- In cases where there are high volumes of low value actions, this can be achieved by automating processes directly. Where individual actions have higher value attached to them, AI should be built into the tools that people use to run business processes. In particular, it should be integrated to support the decision points at which people intervene in those processes to better achieve their objectives.
- Where analytics supports human decision-making passively, by visualising trends, operational AI systems can provide active decision support. This can be by allowing people to test assumptions and scenarios or by running optimisations that result in a recommended path forward.
- AI systems that integrate with live business processes typically have to account for a more demanding set of operating requirements than analytics tools. They come in many forms; user needs, workflows to integrate into, infrastructure, latency and security requirements, policies and regulations.
Did you enjoy this story? There are nine others just like it, told from the perspectives of nine of our other inspiring customers, in the full book 'Ten Lessons From Ten Years of Applied AI'. Just leave your details below to get instant access to your copy of the book.
With contributions from:
Ten Lessons From Ten Years of Applied AI
Download the eBook
Get instant access to ten examples of AI solving the world's biggest challenges, told through the stories of ten of our most brilliant customers.