Skip to content

Paper

Who Is Actually in the Loop?

Why Human-in-the-Loop is no longer precise enough for agentic AI. This paper explains why review, monitoring and accountability are distinct governance patterns.

Ryan McDonough·≈ 18 min read·Version 1.0

For most of the history of machine decision-making, governance had a simple shape: a machine proposed, and a human disposed. The model produced an output; a person decided what to do with it. We called this Human-in-the-Loop, and it served us well. That model is now under strain, because the premise it rested on no longer holds: that the human is the one who acts.

Terms, used precisely

Responsibility:
doing the work. Can be delegated, including to a system.
Accountability:
being answerable for the outcome. In law, this may sit with the organisation or with particular office-holders. In governance, HAL requires it to be traceable to a named accountable owner and not displaced onto software.
Liability:
bearing legal or financial consequences. The economic burden of liability can often be allocated by contract, indemnity and insurance, but statutory, regulatory and third-party liability may remain where the law places it.

When HAL says accountability can never be delegated, execution can, it means exactly this: responsibility for execution can move. Accountability for the system cannot.

01 · The rise of Human-in-the-Loop

Human-in-the-Loop (HITL) did not begin with machine learning. It is an inheritance from control theory, aviation, and early automation, where a human operator remained the final authority over a machine that could otherwise act on its own. The principle was conservative and sound: where a machine might err in ways that matter, insert a human between its judgement and the world.

When predictive models entered high-stakes settings (credit, medicine, hiring), HITL was the natural control. The model scored; a human decided. Oversight and action lived in the same place: the person.

02 · Why the phrase became comforting

HITL worked because of a structural fact about the systems it governed: they did not act. They produced outputs (scores, classifications, drafts, recommendations) and then stopped. The output was inert until a human picked it up. That gap between output and action was where governance lived.

It also worked because the volumes were human-scaled. A loan officer could review the applications in front of them. A clinician could weigh a model's reading against their own. The number of decisions was bounded by human capacity, and so the review was real.

03 · Why it breaks down at scale

Two things break HITL: volume and action. When a system makes more decisions than a human can examine, "human-in-the-loop" becomes "human-rubber-stamping-the-loop". The control persists on paper while evaporating in practice. We have all seen the dashboard with an Approve all button.

A reviewer who cannot review is not a control. They are a liability with a job title.

The deeper break comes when systems begin to act. The moment a system can send the email, file the report, move the money, or delete the record, the gap between output and action closes. There is no longer a natural pause in which a human stands. To insert one artificially, by requiring approval of every action, is to throw away the very capability you deployed the system for.

04 · Review, monitoring and accountability

These are three distinct governance patterns, not stages of maturity. Human-in-the-Loop asks whether a human reviewed the decision. Human-on-the-Loop asks whether a human is monitoring the system. Human-Accountable-for-the-Loop asks who owns the system that made the decision.

Loop separates them. Review fits assistant-style AI. Monitoring fits exception-based automation. Accountability fits agentic workflows where individual review does not scale.

This distinction also sits more comfortably with how modern regulation describes oversight. Article 14 of the EU AI Act requires high-risk AI systems to be capable of effective human oversight: humans must be able to understand limitations, monitor operation, avoid automation bias, interpret outputs, disregard or reverse outputs, and interrupt the system where needed. Notably, the Act treats human oversight as a set of functional controls, not a single undifferentiated requirement. Loop is essentially an unpacking of that ambiguity.

05 · Why agents create a new governance problem

Agentic systems are defined by exactly this: they take actions in pursuit of goals, often chaining many steps, calling tools, and operating with minimal supervision. Once invoked or authorised, they can proceed across steps without returning to a human for each decision.

That shift changes what governance must cover. A faster, more accurate assistant remains an assistant. An actor needs a different kind of governance. The governing question is whether the action was authorised, bounded, recorded, and owned.

06 · The Loop model

Loop defines three patterns for human governance of AI systems. Each fits a different class of workflow. Human-in-the-Loop remains appropriate where a human must review each output. Human-on-the-Loop fits monitoring and exception-based workflows. Human-Accountable-for-the-Loop fits agentic systems where action at scale makes individual review impractical.

The governing question changes with the pattern. Organisations should identify which applies before defaulting to "Human-in-the-Loop" as a catch-all compliance phrase.

The point of Loop is not to rank these patterns from weak to strong. A human-in-the-loop model may be exactly right for a legal research assistant. A human-on-the-loop model may be right for a monitoring system. HAL may be necessary where the system acts at scale and individual review is no longer meaningful. The mistake is not choosing HITL. The mistake is using HITL as a generic label when the actual control is review, monitoring, ownership, or some unstable mixture of all three.

07 · Why HAL matters

Here is the pivotal distinction. Execution can be delegated to a system. Accountability cannot. When a human employee acts within their authority, the organisation remains accountable for what they do. Nothing about substituting software for the employee changes that. The organisation, and a named person within it, remains answerable.

HAL fixes accountability to a person who owns the system. Software has no legal personality and no institutional accountability of its own. It cannot be sanctioned, dismissed, disciplined or struck off. So accountability flows back, through the system, to the human who owns it. That is the entire substance of Human Accountable for the Loop.

The law of agency is not directly about software, but it is useful because it shows how law has long dealt with delegated action: a principal is bound by the authorised acts of their agent, and an agent who exceeds their authority creates liability that does not simply vanish. Corporate law, fiduciary duty, and internal authority matrices all encode the same idea: someone is always accountable for the actions taken in an organisation's name.

One precision matters here, because lawyers will rightly insist on it. Software is not an agent in the legal sense. It has no legal personality, owes no fiduciary duty, and no agency relationship arises when you deploy it. In law, an AI system is a tool, and the consequences of a tool's operation land directly on the organisation that wields it. The parallel HAL draws is structural, not doctrinal: the mechanics of sound delegation (scoped authority, limits, escalation, records) transfer; the legal relationship does not. If anything, this strengthens the case for HAL: there is no legal agent on which accountability can safely be displaced.

Agentic AI puts these principles under pressure. A regulator faced with an erroneous automated filing will not accept "the model did it" as a defence. The organisation that deployed the system is accountable. HAL simply asks organisations to confront that reality before deployment, rather than discover it during an incident.

09 · Conclusion

The likely direction of travel is already visible. Organisations will run many agents at once, built in-house, embedded in vendor products, and increasingly calling one another. Governing this estate will resemble portfolio management more than decision-by-decision review: a registry of agents, each with an owner, an authority scope, a risk level, a HAL score, and a review date.

Governance itself will become continuous: instrumented, monitored, and reviewed, in the way operations became continuous with DevOps. And throughout, one line will not be allowed to blur: however deep the stack of agents, accountability terminates at a human. The future of AI governance will not be about pretending every decision can be reviewed. It will be about ensuring that, when systems act, someone remains accountable for the system that acted.


Continue to the Loop model, the HAL framework, or put a system to the test with the HAL Score assessment.