Lesson 10
OpenAI
Once an obscure academic niche, AI safety is now one of the defining issues of the age. Faculty is working with the most innovative AI labs on the planet, including OpenAI, to make sure that AI models are as safe as they can be, now and in the years to come. Because if we don’t get this right, we’ll lose control of our future.
A four-star US general
stands in heated conversation with the Chinese premier. On the conference room wall, monitors display the latest news headlines. A drone swarm has attacked a warship in the South China Sea, bringing tensions in the region to boiling point. A US tech company has announced a breakthrough that points to an imminent future where artificial intelligence will outclass the human kind, but just two weeks later, a Chinese state-owned corporation has replicated the feat – with accusations of corporate espionage and theft flying across the Pacific.
The light from the TV screens silhouettes the figures, masking their expressions. Smoke curls in the lights suspended low over the circular conference table; vast concrete buttresses soar into the darkness above. If it looks like a cross between Dr Strangelove and a Bond villain’s lair, then that’s an apt comparison. It’s no exaggeration to say that the future of the world hinges on the men and women in this room being able to agree on how to rein in these frightening new advances. If they can’t, then humanity will have ceded control of our fate to a new, alien intelligence.
If you’re comforting yourself that this is all make-believe, think again. This really happened.
Sort of.
‘I didn't come here to tell you how this is going to end’
The meeting described above took place in July 2024, at an undisclosed central London location. The scenario was fictional, but the people involved were real. They included retired US army officer General Stan McChrystal; former UK National Security Advisor and Cabinet Secretary Mark Sedwill; the globally renowned Israeli historian and thinker Yuval Noah Harari; Jaan Tallinn, founder of Kazaa and Skype; and numerous other senior representatives from politics, academia and technology companies.
It was the culmination of a million-dollar exercise called Intelligence Rising that Faculty ran, in collaboration with the Tony Blair Institute for Global Change and concerned philanthropists, to improve understanding of the possible societal and geopolitical implications of AI.
The scenario was devised as part of a narrative wargame that played out the likely impact of AI over the next ten years. The conference room was a set, artfully arranged by the Oscar-winning director Elena Andreicheva and her team, who filmed the entire event for a movie released in early 2025.
The wargame participants were all hardened power-players at the top of their professions, yet more than one of them left the room shaken by what they had experienced.
As AI becomes more powerful, its successes and failures will have greater and greater impact on the world we live in. But, as Intelligence Rising demonstrated, few people even in elite circles really understand what that might actually entail, or how profoundly it could reshape the world order. Not just the geopolitical order, but the human order. And even recognizing the problem is only the first step towards the really hard question: what should we do about it? What can we do about it?
As so often, if you want to understand the future, start by looking back.
Rise of the machines
The CEOs of what are described as the ‘frontier’ AI labs – leading-edge companies including DeepMind, OpenAI and Anthropic – are now rock stars within technology circles. If someone working in the field says ‘Demis’ [Hassabis], ‘Sam’ [Altman] or ‘Dario’ [Amodei], they know they’ll be understood.
And the founders’ fame is bleeding through into the wider world. Demis Hassabis of DeepMind was awarded the 2024 Nobel Prize in Chemistry for his work on using AI to predict protein structures. When ChatGPT launched in November 2022, it gained a million users in just five days, and reached a hundred million in two months, leading UBS to hail it as the fastest-growing consumer application in history (for comparison, Tik-Tok took a leisurely nine months to reach a hundred million users). Sam Altman was named Time magazine’s CEO of the year for 2023.
It wasn’t always like this.
Faculty’s first encounters with frontier labs’ founders go back many, many years. Faculty’s early employees got to know them when AI was deemed a speculative field, and thinking about ‘AI safety’ was, to put it kindly, a niche interest. In those days, there was emerging awareness of the risk of algorithmic bias, but little understanding of the scale to which this could become an issue.
Although many recognised the need for AI to protect privacy, few foresaw the range of ways in which this would need to be considered. And society was just beginning to extrapolate from these risks to consider, for example, how to ensure that future powerful intelligences would protect human values and support human flourishing.
Faculty’s early community shared a concern with the visionaries of the frontier labs: how do we align machines and humans to work in harmony? Back in the early 2010s, long before Elon Musk or Steven Hawking or any of the world-famous commentators had piled into the debate, there were probably a hundred people globally with a serious interest in the problem. Faculty’s staff were among them. It was, admittedly, a slightly strange crowd, prone to being dismissed as cranks. But it included many people, now at the forefront of the field, who had the foresight to recognise that safety was a crucial part of the path to ever more powerful intelligence.
But recognising that something’s important doesn’t necessarily make it easy to do what needs to be done.
Testimonials
“A big part of how we make sure that our technology is safe to be deployed into the wider world is our ‘red-teaming’ programme. We ask people and teams that we trust, like Faculty, to help us assert that our models are going to meet the safety standards that we set out.”
‘It is impossible for me to harm, or by omission of action allow to be harmed, a human being.’
At its core, the key question for safe AI is this: how do you make sure that AI acts in accordance with humanity’s values and rights? In AI circles, this is known as the ‘alignment problem’, having an AI that is aligned with human intentions, and not working against them.
But from that simple formulation come a host of thorny questions. As humans, we often struggle to articulate what exactly we want. In fact, we often don’t even know what we want, let alone how to express it. And even if we could, AI can’t just be a genie that slavishly gives its operators what they ask for. The things people want might be malicious, contradictory, or counter-productive.
An advanced AI needs some kind of values to bound and inform it, but who gets to decide those values? And how do you encode them in what is, after all, simply a piece of software? Do you go for a legalistic, rules-based approach – or try to teach machines the precepts of moral philosophy?
The development of AI itself shows that trying to hard-code rules about the world into software will never be as flexible, adaptable or useful as coding models that learn their own rules. But even if we were able to teach the AI our values, how would we make sure the algorithm interpreted them as we would want? Software can have bugs, and life can throw up edge cases that defy any attempt to find an ethically tidy solution.
And if we could overcome all these hurdles to create an AI that perfectly understood human values and applied them flawlessly, there’s still a question as to whether it would be right to do it. If we hard-wire a machine with early 21st century values, would that position be locked in for eternity?
Think back to the 1800s, where slavery was legal, women were men’s property, and beating children was considered the hallmark of good parenting. If AI had been invented then, would we still be living now yoked to a technology built on those values?
Although it’s tempting to believe that our current moral order is the apogee of civilisation, in another two hundred years it’s likely that aspects of our own society will seem as hateful to future generations as wife beating and child labour seem to us. How can we make sure that our children are allowed to adopt a different approach to the world than the one their parents took?
You can see why the people asking these questions came across as cranks and obsessives. The questions seem esoteric, unmoored from the concerns of ‘normal’ people or businesses.
In fact, they’re everyone’s concern.
‘I’m sorry, Dave. I’m afraid I can’t do that.’
The time when AI safety was the obscure hobby-horse of a few dozen enthusiasts now seems an age away (or, by the warp-speed timescales of AI’s development, about ten years ago). The frontier AI labs, and many governments, recognise the potentially catastrophic risks of misuse, and are investing heavily to address the issues.
Faculty has grown up too, though it’s still at the forefront of AI safety, working with labs and governments to conduct novel research, develop new tools, and assess risk. The company is one of the first ports of call if one of the frontier labs needs to test the safety of its newest model, as when OpenAI wanted to check out its o1 model.
The model showed a step change in reasoning abilities, and its creators wanted to be sure they’d done everything possible to deploy it safely. ‘It’s absolutely paramount that foundation models are built safely,’ says Sam Altman, OpenAI’s founder. ‘I know Marc and Faculty have cared about AI safety for a long time, and so they’ve been a natural and wonderful partner for us on this work.’
Faculty also works with the UK AI Safety Institute and other organisations to make baseline safety assessments of general purpose models. Faculty’s robust capability assessments test models in different ways, ranging from question-and-answer engagements by experts, to full-scale randomised control trials that test what bad actors might be able to achieve if assisted by AI, against what they can do without it. Faculty have also piloted cutting-edge techniques to improve safety, such as the feasibility of a model ‘unlearning’ dangerous knowledge that a bad actor could use.
Other risks have less spectacular outcomes – no explosions or homebrew bioweapons – but operate in more insidious ways. Models that contain biases could present significant risk to a broad population if they create discriminatory outputs, whether that’s in the content they generate or the decisions they take.
This has been an issue for AI since well before the latest developments in generative AI, and Faculty has long been a leading light in research to identify and mitigate biases. In 2020, the company provided the technical assessments on which the UK government’s ‘Review into Bias in Algorithmic Decision-Making’ was based.
But the most intrinsic biases don’t come from poor coding. Particularly with generative AI, the biases slip into the model with the raw material of its training and tuning data. AI is trained to represent the world, but modern society is the outcome of centuries of complex biases and discriminatory approaches. So there’s a fundamental question: do we want AI that represents the world as it is, or as we would like it to be? And if the latter, as who would like it to be? Different cultures, personalities or political systems might have very different ideas of what an ideal world would look like.
The frontier labs are all committed to weeding out bias and discrimination in their models. They want to make sure that their models don’t harm major parts of society, and that the benefits of AI are felt inclusively. Whatever your demographics, you have a stake in the labs getting this right, and finding the right balance between the ‘world as it is’ and the ‘world as it could be’.
The first and most fundamental thing you can do is keep humans in control. This is Faculty’s cornerstone philosophy when it comes to AI safety, but there are different ways of achieving it.
‘No 9000 computer has ever made a mistake or distorted information’
But – crucially – safety is contextual. Language that might be entirely acceptable for a model helping a screenwriter develop their characterisation might not be so appropriate for a child doing their homework. So there’s a limit to how far model providers can be responsible for the safety of their products. They can’t foresee all the contexts in which their models will be deployed: only you, the person or organisation using it, have that understanding.
If you integrate a chatbot for customer service, the frontier labs won’t be making sure that their models aren’t rude to your customers. They won’t stop the model from deciding to give ruinous discounts, or offering unauthorised financial advice.
Just as with any other intelligent entity you employ, you will have to define what is appropriate and acceptable for your situation, and you will have to check that those boundaries are implemented correctly, because only you understand how the model will work in your use-case. And you will be accountable if it goes wrong, whether that’s because you gave the AI poor instructions, introduced a bug or even chose the wrong model for the purpose.
Ultimately, for every version of the alignment problem faced by Sam Altman, Demis Hassabis or Dario Amodei as they develop the world’s most advanced models, there is almost always a parallel problem that ‘normal’ organisations face when trying to make those models do what they want.
So what should you do?
‘I can only show you the door. You're the one that has to walk through it.’
Fortunately, if other businesses share versions of the problems faced by the frontier labs, they can also learn from the solutions. The same safety techniques that Faculty has implemented with OpenAI, the UK AI Safety Institute and others, give your organisation a suite of tools you can choose as appropriate.
The first and most fundamental thing you can do is keep humans in control. This is Faculty’s cornerstone philosophy when it comes to AI safety, but there are different ways of achieving it.
The most direct approach is to have a human ‘in the loop’: providing input at critical stages in a process so that the person maintains oversight and control. As AI models are probabilistic and always contain a degree of uncertainty, in many circumstances it will make sense that they should make recommendations to a member of the team, who understands the context and remains ultimately responsible for taking the action.
But there are plenty of circumstances where it’s impractical to insert a human into the loop, for example when the process needs to operate quickly, at high volume, or both. Many web services look like this. Content and product recommendations on Amazon and Google, or ad targeting at Facebook, all operate at a pace and scale beyond human intervention. As does ChatGPT.
In these cases, you rely on humans ‘over the loop’ to robustly test the models before they are deployed, and set parameters that constrain their outputs.
‘A big part of how we make sure that our technology is safe to be deployed into the wider world is our “red-teaming” programme,’ says Sam Altman. ‘We ask people and teams that we trust, like Faculty, to help us assert that our models are going to meet the safety standards that we set out.’
Red-teaming (the name derives from military wargames, where the adversary is always the ‘red’ team) brings together teams of experts, both in-house and external, to examine the AI models. The group stress tests what the models are capable of, and how those capabilities might cause harm in the world, whether intentionally or not.
As well as testing and red-teaming, we can use safeguards and set operating parameters to ensure safe outputs. Anyone who has tried to make ChatGPT say rude things about someone will have seen this kind of constraint in action. These safeguards can be designed and implemented at various stages through a model’s lifecycle.
For instance, if you know in advance that you don’t want your model to give financial advice, you could remove material that would increase that risk from the dataset you use to train it. Once it’s been trained, you could further tune it to avoid giving that advice through a process called ‘reinforcement learning from feedback’ where the model learns what it should and shouldn’t do in accordance with feedback provided by humans or by another AI.
If you have created a generalisable AI model, but you later decide that you’d rather minimise the likelihood that it will provide financial advice, you could append an instruction to that effect to every user’s query. And, if really necessary, you could implement a ‘classifier’ model to check every output before it goes to the user, just to be extra sure. Of course, highly capable users of the system may still be able to ‘jailbreak’ it to do something you don’t want, but they’ll have to work hard to do so.
Finally, governance is key. AI safety should be acknowledged and owned at a suitable level within any organisation. It should ultimately be part of the governance process that is used for other important categories of risk.
Faculty is keen to ensure that this ability to implement AI safety isn’t limited only to organisations with the technical heft of the leading research labs, so they have developed a platform called Frontier that makes this much easier. It enables individual models to be parameterised and governed. And it allows parameters to be implemented to constrain collections of models too, so that leadership teams can set policies that bind all connected AI systems right, across an organisation.
There will always be a temptation to charge ahead with the latest technology and leave safety as an afterthought. But, done right, safety doesn’t have to come at the expense of capability.
The unknown future rolls towards us
Ultimately, AI safety is not just for tech CEOs or government bodies. All of us have a responsibility to consider safety in our own contexts, and to add our voices to the debate about how we want AI to shape our future.
We all face a choice: either prepare now and consider the safety of the models we use in our business our responsibility, or panic later once we realise we have lost control of their models.
There will always be a temptation to charge ahead with the latest technology and leave safety as an afterthought. But, done right, safety doesn’t have to come at the expense of capability. Cars are faster and more efficient than they were 50 years ago, and they’re also much safer. Similarly, the scale of long-term challenges like AI’s alignment with humanity shouldn’t freeze us from dealing with immediate issues, like algorithmic bias.
If humanity gets this right, we can control AI models and unlock their benefits in a safe and responsible way. If we get it wrong, those models will control us, making decisions for us and about us based on values and approaches with which we may not agree.
Back in the conference room, there’s still no breakthrough. The clock ticks down as the Chinese and American delegations face off. ‘AI could threaten mankind itself,’ warns a visibly troubled General Stanley McChrystal. ‘What are we going to do about it? We need big, bold thoughts.’
The ending of the game is revealed in the Intelligence Rising film. But real life isn’t a game: AI safety is a constant work in progress, an everlasting dialogue between technology, humanity and the world.
Intelligence is a tool that people have been using for tens of thousands of years. Now machines have it too. It’s been put to terrible purposes like exploitation and destruction; but it’s also built civilisations, created art of astonishing beauty, and allowed us to do things in everyday life that our ancestors thought were reserved for the gods.
How intelligence gets used in the future is a story we all have to write together.
The lesson in summary
If you don’t control your models they control you.
- AI models are probabilistic. The most powerful are black boxes. They don’t always behave in entirely predictable ways. And users can’t really tell why they do the things they do. This creates a new set of risks.
- As AI technology is embedded more deeply in business processes, it is essential that the correct controls are put in place around it.
- In processes where individual decisions or actions are valuable, humans should be kept in the loop. AI should support them not replace them. Decision support systems should be designed to give users well-targeted and parsimonious analysis, rather than drown them in data. And they should be interpretable and interactive.
- In processes where the volume or frequency of decision or actions makes it impractical to have humans in the loop, they should nevertheless be ‘over the loop.’ This means that they are able to specify the parameters in which the AI models operate, and interrogate their outputs.
- Implementing this at the level of an organisation will require technology platforms that allow those responsible for governance to set policies that bind all of the AI models that operate across an organisation. Faculty’s Frontier platform is designed to do exactly this.
Did you enjoy this story? There are nine others just like it, told from the perspectives of nine of our other inspiring customers, in the full book 'Ten Lessons From Ten Years of Applied AI'. Just leave your details below to get instant access to your copy of the book.
With contributions from:
Ten Lessons From Ten Years of Applied AI
Download the eBook
Get instant access to ten examples of AI solving the world's biggest challenges, told through the stories of ten of our most brilliant customers.