Safety from the start: Our journey to building some of the world’s best tooling to make AI safe
I first learnt about AI safety from a blog.
I don’t remember how I found my way to the article; like anyone, I’m prone to disappear down internet rabbit holes of continuous consumption, but as a PhD student I had more opportunity than most.
The blog – Overcoming Bias – was revelatory. Every post stretched my brain. I couldn’t believe that one of the authors, Eliezer Yudkowsky, was only a few years older than me. It felt like being in the presence of a qualitatively different level of intelligence.
I was a Physics PhD student studying Quantum Mechanics, and Eliezer had a deeper understanding of the foundations of the subject than I did, on top of all his knowledge of game theory, decision theory and AI.
So when he claimed that powerful artificial intelligence was a greater threat to the human race than either global warming or biological pandemics, I didn’t immediately dismiss it out of hand. Obviously, I was sceptical; how can a bunch of internet weirdos have stumbled onto something so important? Surely, the UN, the WHO, or some other three-letter agency would be the first to see a threat this important?
Over a decade later, and it’s clear that the weirdos got something right (although other things wrong). Whether very powerful AI is the biggest threat, or one threat among a few (asteroids, nukes, bioweapons, global warming), the three-letter agencies are starting to wake up to the very real potential concerns around AI.
Superintelligent AI isn’t a danger now – but it will be one day
Now, at this point, it’s worth adding a dose of realism. Current AI techniques seem like they are a long way from true superintelligence (although, truthfully, no one actually knows how far). Many experts will quote timescales of decades to centuries for superintelligent AI. There is zero chance of techniques that we use today becoming superintelligent without development. When we ‘train’ AI models, it’s the equivalent of putting clay into a kiln; training effectively ‘bakes in’ the ability to do only one task. Expecting these models to become superintelligent is no less ridiculous than modelling clay into the rough shape of a bird, firing it, then expecting it to take off and fly. Not only is it completely locked in shape, even if it were flexible, it’s definitely not a bird.
If we found out an asteroid was on a collision course with earth, but only going to hit in a few decades, would you get scientists and engineers to start working on the problem? Or would you wait 39 years, and just hope we can all pull together at the last moment?
A sensible critic might then ask “why worry then?”. If we’re decades away from superintelligent AI, don’t we have other problems to worry about? I think that depends on whether you think that, if we found out an asteroid was on a collision course with earth, but only going to hit in a few decades, you’d get scientists and engineers to start working on the problem? Or would you wait 39 years, and just hope we can all pull together at the last moment?
Now, there are some subtleties to that comparison*, but on balance I think we should start now. That way, we can work through some of the foundational questions in plenty of time. That’s particularly valuable in the context of superintelligence, because there are a load of ethical questions that should be the responsibility of civil society, and not just technologists. But we need time to have those debates, to understand the consequences, and to figure out how to constrain our algorithms to ensure a wide benefit to this technology. And that only happens if we start early. That’s why, at Faculty, we originally built an AI safety research group. We decided to get started on this journey a few years back, so that we’re well prepared when the time comes.
Faculty’s AI safety strategy
Our strategy was to start our work on concrete short and medium term problems, and work our way up to the more esoteric long term safety concerns. That seemed sensible because 1) it plays to our organisational strengths as an applied AI company and 2) looking at the history of science, there’s plenty of times that working on concrete problems has progressed a field more effectively than the most high-falutin’ philosophy. We thought that in five or so years, we might start to get some commercial benefit from the work, which we could reinvest, and that would drive a positive flywheel to fund more research.
Well, that turned out to be wrong, in a wonderful way. As we learnt more about near term AI safety we realised that it was profoundly helpful in doing regular AI. We now divide the landscape into four:
- Fairness (doing the right thing)
- Robustness (continuing to do the right thing, even in the complicated real world)
- Privacy (not surfacing private information)
- Explainability (understanding and interpreting your model).
Irrelevant of whether you want the most ethical model or the most performant model, explainability and robustness are vital. And, if you were concerned about not getting sued (and most businesses are!) then ensuring the fairness and privacy of your model are vital to you.
So now we’re on a fascinating journey of discovery. We have built some of the best tools for ensuring that our algorithms are fair, robust, private and explainable (get in touch if you’d like to hear more!), but we’re pushing ever deeper. We’re pushing the scientific boundaries of the field, and we’ll be building this into products and making it widely available.
If you’re interested in responsible and ethical AI, or legal and performant AI, or all four, then drop us a line.
* Humanity has a good understanding of the physics of asteroids, but we have no idea of the foundational technologies of a superintelligence. In principle, this might be a reason to wait. Imagine the difficulty of trying to make cars safe before they’ve been invented – you’d probably spend your time building high friction horseshoes. But, in practice, you’ll never know that you’re working on the right foundational technology until you have built a superintelligence, and then it’s too late to uninvent it and build the safety systems. And we have very good reasons to think that some of the tooling (explainability and robustness particularly) generalises to much more complex algorithms.