Steering the ship on the future of AI

Our Head of Product, Tom Oliver discusses the 'alignment problem', and how we could progress a more 'balanced' approach for the future.

2025-09-22

AI Development

00:00

--:--

Listen on SoundCloud

In previous instalments, we've explored how AI is reshaping our economy and society. Now we turn to perhaps the most urgent question: how do we steer these increasingly powerful systems towards beneficial outcomes?

This challenge lies at the heart of what AI researchers call the "alignment problem". Ensuring that artificial intelligence systems, pursue goals aligned with human values and intentions.

In this article, I’ll discuss the problems with the current alignment approach, and what a more ‘balanced’ approach could look like instead.

The limitations of current alignment approaches

Despite growing recognition of alignment's importance, the status quo ultimately suffers from some critical limitations:

Technical myopia: Most alignment research and efforts accept the current paradigm. What Stuart Russell calls "the standard model" of AI development [1]. The standard model is the dominant approach to building AI systems - wherein we optimise a learning routine to achieve some pre-specified objective. This approach essentially assumes you already know the “right” goal – for example predicting the next word in a sequence by minimising errors (or “loss”) on a test set. Once that goal is fixed, you then throw data, compute, and model architecture at the problem, optimising as hard as possible to achieve it. This sidesteps the deeper question: how do we determine which goals are worth pursuing in the first place? Do we need to look beyond the current paradigm - transformer-based architectures and even deep learning - if we are to land on an approach with the right balance of performance and safety? These unfortunately remain research questions.

Expertise capture: Despite progress in recent years research into both advancing AI capability, and making breakthroughs on safety, remain dominated by a small, homogeneous group of technical experts. As Inioluwa Deborah Raji, and others, have pointed out, the field suffers from a significant lack of diversity in who gets to decide what "aligned" actually means [2]. When a narrow group (however defined), gets to pick both the problems and the solutions, blind spots are inevitable. The highly capital intensive nature of high-scale foundation models, contributes to a deepening of this effect in the system.

Preference uncertainty: Humans are notoriously bad at articulating what we truly want. As behavioural economists have demonstrated repeatedly, what we say we want often differs from what we choose, and both may differ from what actually improves our wellbeing [3]. This creates a fundamental challenge. If we cannot reliably identify our own values and preferences, how can we align AI with them?

Collective action problems: Even if individual preferences can be perfectly captured, aggregating them has further complications. Kenneth Arrow's impossibility theorem, shows that no voting system can translate individual preferences into a group decision, while still meeting the most basic rules of fairness [4]. And as we've seen with social media algorithms optimised for engagement, individual preferences can aggregate into societal outcomes that nobody wants.

Who decides the future of AI safety and development?

The debate about allocation of research effort between capabilities and safety, is often framed as two sides of the same coin. Many leaders in the field argue the technical expertise and the knowledge needed to advance safety goes hand in hand with developing and understanding capabilities. There is plenty of truth to this position. But the framing serves a convenient purpose. It reinforces the idea that a small select group should govern both domains. To me this consolidation of authority and control is a concern. I don’t doubt the leaders’ intentions. But power corrupts. And these technologies could turn out to be the most powerful ever developed.

While capabilities and safety do share technical dimensions, they lead to fundamentally different ideas in what the ‘right’ thing to do is. The question isn't simply whether an AI system accurately executes a given goal, but what goals it executes, and what end that serves in the world. These aren't merely technical questions. They are moral, political, philosophical, social, and economic ones.

As Andrew Critch and Stuart Russell have argued, the problem becomes even more complex in a multi-stakeholder world [5]. Different individuals, organisations, and societies, have legitimately different values that deserve representation. By claiming ownership over both capabilities and safety discourse, leading AI and technology companies can effectively monopolise decisions that should belong to democratic discourse and diverse stakeholders.

In-short, this concentration of power represents a profound challenge to liberal democratic values. We wouldn't accept pharmaceutical companies having exclusive authority to regulate drug safety. Or weapon manufacturers determining arms control policy. Why, then, should we permit AI leaders to be the primary architects of how AI is governed?

Beyond technical solutions

Alignment isn't just a technical problem – it's a societal one. As Iyad Rahwan's "Society-in-the-Loop" framework suggests, we need mechanisms to incorporate collective human oversight into AI systems [6]. This means developing institutions that can represent diverse stakeholders and translate their values into guidance for AI development and deployment.

Pulling back the aperture to a broader group and perspective can unlock new avenues. After Arrow’s impossibility theorem gained prominence, deliberative democracy theories emerged that focused on the quality of the democratic process, rather than just the aggregation of fixed preferences. This unlocked a view on solutions at a different “layer of the stack”. Rather than attempting to reformulate the maths, we accepted the maths and reframed how it can be experienced.

Could being open about how well people’s preferences are (and aren’t) met on different issues create healthier democracies, than pretending we can perfectly merge everyone’s preferences into one choice? What could a more balanced approach look like? They could well include:

Participatory design: Involving diverse stakeholders in defining what "aligned" means for different applications
Computational systems for simulating outcomes: Technologies that allow us to simulate the consequences of different goal systems before deployment
Institutional innovation: New governance structures that can represent the interests of diverse stakeholders, including those traditionally marginalised
Proportional investment: Funding for safety, governance, and institutional design that reflects the magnitude of potential risks

Towards collective wisdom

Even the most sophisticated AI safety techniques will fall short if they attempt to align AI with an inadequate conception of human values. Perhaps the most profound limitation of current approaches, is simply that they assume from the start that we already know what we want.

But as philosopher L.A. Paul has argued, some experiences are "transformative" – they change our values in ways we cannot predict beforehand [7]. AI itself may be such a transformative technology. Changing what we value as it changes what is possible.

It is my view that alignment will not be a one-time achievement but an ongoing process of collective deliberation. It requires mechanisms for society to continuously reflect on, and refine its values as technology and capabilities evolve. It requires, in short, wisdom rather than mere intelligence.

And this brings us to the core theme of our next instalment: What does it mean for humanity to be wise, rather than merely intelligent in the age of AI? How might we live up to our name – Homo sapiens, "wise man" – as we develop technologies that may soon rival or exceed our intelligence?

References:

[1] Russell, S. (2019). Human Compatible: Artificial Intelligence and the Problem of Control. Viking.

[2] Raji, I. D., et al. (2020). Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. In Conference on Fairness, Accountability, and Transparency.

[3] Kahneman, D., Sibony, O., & Sunstein, C. R. (2021). Noise: A Flaw in Human Judgment. Little, Brown Spark.

[4] Arrow, K. J. (1950). A Difficulty in the Concept of Social Welfare. Journal of Political Economy, 58(4), 328-346.

[5] Critch, A., & Russell, S. (2021). Multi-Principal Assistance Games. In Proceedings of the 35th AAAI Conference on Artificial Intelligence.

[6] Rahwan, I. (2018). Society-in-the-Loop: Programming the Algorithmic Social Contract. Ethics and Information Technology, 20, 5-14.

[7] Paul, L. A. (2014). Transformative Experience. Oxford University Press.

Tom Oliver

Head of Product

Tom is Head of Product at Faculty and leads our Decision Intelligence Platform Faculty Frontier ®. He has more than a decade of experience across management consulting and Faculty, during which he has helped some of the world's most complex organisations blend strategy and technology to transform their core operations. From his humanities background at Trinity College, Oxford he has long-standing interests in political economy and how it’s evolving in the 21st century; humanism and "what makes us human"; existential risk especially re. AI; and human-centric design.

View All

We're all philosophers now

2025-04-23

AI Development

The Geopolitics of AI

2025-03-17

AI Development

View All