Beyond Technical Safeguards: Human Behaviour is the Missing Piece in AI Safety

AI safety discussions predominantly focus on easy to conceptualise, highly salient risks including algorithm bias, hallucinations and disinformation. While these are crucial concerns, they overlook a fundamental truth we’ve learned from other high-stakes fields like aviation and healthcare: sometimes the most dangerous risks can hide in plain sight. Dr Moira Nicolson and Holly Marquez (Government Communications, Cabinet Office, UK Government) introduce a framework for anticipating and managing these risks, supporting a safer rollout of AI use in the public sector and beyond.

The blind spot in current AI safety approaches

Human oversight – known as keeping a “human in the loop” – is a popularised approach to AI risk mitigation. However, research shows that even experienced professionals can struggle to effectively oversee AI systems. For instance, studies of judges reviewing algorithmic bail recommendations found that 90% of human overrides actually introduced more bias rather than reducing it. Similar to how many people with nut allergies often miss disclaimers on food containing nuts (with fatal consequences), most people won’t pay attention to disclaimers on AI.

Since risks arising from the well-intentioned operational use of AI have been comparatively neglected in AI safety discourse, they are also much more likely to fit under the category of risks known as “unknown unknowns”. This is a problem because you can’t mitigate risks you don’t know about.

Figure 1: Why ‘human-in-loop’ is not a fix-all solution

A concerned person sitting at a desk, looking stressed while interacting with AI tools, surrounded by dialogue boxes expressing worries about the reliability and security of the outputs.

Risks from well-intentioned use of AI

Imagine an employee, Marco, in 2026. His low-value, ‘easy’ tasks have been fully automated using AI tools. The expectation is that this frees his efforts to focus 100% of his time on undertaking more complex tasks. Marco, like others in his team, finds it difficult spending all of his time on cognitively demanding tasks. He doesn’t have the skills to manage this well, and ends up feeling mentally fatigued and stressed, reducing his productivity. If this were to happen across the organisation and the wider economy, productivity could decline rather than increase after  AI roll out. And, the link between cognitive fatigue and AI-enabled task automation could go unrecognised. This creates a hidden risk where organisations might continue to push for full automation of routine tasks.

We’re not saying that the attention-grabbing AI risks like biased algorithms or human extinction aren’t important or worthwhile researching! Far from it. But if you want to stop the bigger risks, you have to understand where they are likely to come from to stop them before they happen, not fixing or apologising for them afterwards. That means understanding the causal mechanisms. These mechanisms are all the “mundane” decisions and actions people take day to day.

Take Sarah, who has access to an AI tool to help them sift applications for a new role in their team. We need to look at how Sarah uses the tool to identify potential harms. While algorithm bias is a very salient risk of LLMs, with clear potential for leading to harmful macro-level outcomes, it is the combined actions of the organisation and the person using the tool for this task that leads to this risk materialising. For example, if Sarah’s organisation normalises the use of an LLM to automate sifting of job applications, Sarah may use the output and trust it. This may lead to them not seeking a second assessor to quality-assure decisions on job applications, with Sarah serving as the only “human in the loop”. This might not be a problem if one person does it, but if everyone started doing it, this could lead to unequal labour market outcomes.

Figure 2: AI tools might lead to harms through their interaction with people

Infographic illustrating the risks of AI in recruitment, highlighting the relationship between algorithm bias, company practices, and potential discrimination in hiring outcomes.

The path forward: A practical framework to identify and address risks of Generative AI

The Cabinet Office Behavioural Science Team has identified six distinct categories of ‘hidden’ behavioural and organisational risks that emerge from organisational AI implementation. These were discovered by analysing random samples of prompts submitted to Assist, Government Communications’ own bespoke, conversational AI tool, and screening them for surfacing unintended consequences for policy interventions.

Figure 3: Six risks from human-AI interaction to look out for

A table detailing six categories of hidden risks associated with AI implementation in organizations, including Quality Assurance, Task-tool Mismatch, Perceptions, Emotions and Signalling, Workflow and Organisational Challenges, Ethics, and Human Connection and Technological.

To help teams surface and mitigate these risks, Government Communications has created a set of prompts that can be used by teams to surface potential issues before they become problems. The Mitigating Hidden AI Risks Framework focuses on risks most likely to arise from organisational roll-outs of AI tools. It also outlines some of the causal mechanisms by which the higher order risks in MIT’s taxonomy would arise. In our assessment, risks from organisational AI roll outs are more likely to arise from well-intentioned use. For example, a ‘Task-Tool Mismatch’ risk could arise due to ineffective or defective organisational implementation, such as if a senior leader requests that a tool be used for a task it is not appropriate for, and staff do not feel able to challenge this.

The prompt questions can be used by teams deploying AI tools to help them elicit risks, agree on mitigations and monitor their effectiveness. In Government Communications, team members working on delivering our AI tool Assist are allocated to monitoring specific risks and reporting back on them so that they can be managed as a team. Each risk is given a RAG rating and monitored accordingly. Other examples are in Table 1.

Table 1: Prompts to identify and mitigate risks during AI roll out

Category and RiskPrompt QuestionsPossible risk to successful rolloutMitigation steps
Emotional responses from staff rolling out the AI toolCould staff perceive a threat to their job, the type of work they do, or skill/expertise?Concern that the tool may replace their jobs could lead to staff being reluctant to adopt and use AI toolsTake a human-centred approach   Conduct user testing to identify potential negative reactions   Apply a bottom-up development approach to the tool
Risks to standards of public serviceCould the tool lead to biased decision and discriminatory outputs that affect public trust in Government?Concern that the tool affects or benefits some social groups more than othersEmphasise transparency and accountability   Report on AI tool effectiveness   Compare AI-assisted decision with human-only decisions to identify discrepancies

Interdisciplinary expertise: A must-have for AI roll-out teams

AI tools can help enhance workplace productivity and deliver wider benefits, but only if they are embedded effectively into existing daily routines, workflows and team processes.

To truly harness AI’s potential while minimising risks, we need to look beyond people with machine learning expertise. Identifying and overcoming behavioural risks goes far beyond doing user research. It requires leaders and interdisciplinary AI delivery teams to work in partnership with people across all levels and specialisms in an organisation: leaders setting strategic direction, managers implementing new processes, employees adapting to new ways of working, and end-users engaging with AI-powered solutions.

Understanding and addressing behavioural risks isn’t just an add-on to AI safety – it’s a fundamental requirement for successful implementation. However, the current landscape of AI implementation often neglects the human element, treating the challenge as predominantly a technological or economic pursuit, rather than acknowledging its profound social and cultural implications. Although many of the large AI companies have in-house ethicists, philosophy is not enough on its own to mitigate hidden behavioural and organisational risks. This oversight risks undermining the very efficiency and impact that AI promises to deliver.

To find out more, the Cabinet Office has produced a guide that brings fields of user research, behavioural science, digital design and change management together to demystify the process of safely scaling and embedding AI tools within organisations. The guide goes beyond theory – it is based on proven experience from successfully scaling AI in complex organisations through our work on Government Communications’ AI tool Assist, which now serves thousands of users across more than 200 government organisations.

Read here: The people factor: A human-centred approach to scaling AI tools.

Dr Moira Nicolson leads the Behavioural Science Team at the Cabinet Office and is an Honorary Research Fellow at the UCL Energy Institute. Watch Moira’s BPP seminar presentation on Using Behavioural Science to Address AI Safety Risks in Public Sector Roll Out.

Holly Marquez is a Senior Behavioural Scientist in the Cabinet Office Behavioural Science Team, prior to which she specialised in digital service design and adoption at HMRC.

Government Communications is the professional body for public service communicators working in government departments, agencies and arm’s length bodies. GCS Assist is the first general purpose AI tool approved for use across the UK Government. Developed in-house by Government Communications’ multidisciplinary Applied Data and Insight Team, the tool enables communications professionals to access a secure conversational AI tool for work tasks.