Skip to content

Loop · Human-in-the-Loop

Making review genuine

Human-in-the-Loop is the right model when a human reviews individual outputs before action is taken. This guidance helps organisations check whether that review is real.

When Human-in-the-Loop fits

Human-in-the-Loop is appropriate when the AI generates outputs for a human to evaluate, and the human decides what to do with them. The key condition is that nothing happens until a person reviews and approves.

It works best at human-manageable volumes where reviewers have the expertise to evaluate what they are approving. It is the natural model for drafting, summarisation, legal research, and any assistant-style tool where judgement remains central.

Review health checklist

01

Is the volume manageable for genuine review?

Good practice

A reviewer can spend meaningful time on each item. Volume is monitored and escalated if it grows beyond capacity.

Failure mode

Items are approved in batches without individual examination. Volume has grown faster than review capacity.

Signal: Track average review time per item. If it falls below a realistic threshold, review is no longer genuine.

02

Does the reviewer have the knowledge to evaluate the output?

Good practice

The reviewer understands the subject matter, can identify errors, and knows when to escalate.

Failure mode

The reviewer approves outputs they cannot independently evaluate. The AI has more domain knowledge than the person reviewing it.

Signal: Ask: could this reviewer catch a plausible but wrong output? If not, review is not a control.

03

Is adequate time allocated for each review?

Good practice

Review is built into the workflow with realistic time per item. Reviewers are not expected to clear queues at speed.

Failure mode

Review is treated as a formality. Reviewers face pressure to approve quickly to keep the workflow moving.

Signal: If time-per-review is under pressure, the human is not in the loop in any meaningful sense.

04

Is there a process when the reviewer disagrees?

Good practice

Reviewers can reject, escalate or amend outputs. Disagreements are recorded. Patterns of rejection trigger review of the AI system.

Failure mode

There is no clear path for rejection. Reviewers approve outputs they are uncertain about because there is no alternative.

Signal: If rejection rates are near zero across all reviewers, the process may be performative rather than substantive.

05

Is the review decision recorded?

Good practice

Who reviewed what, when, and on what basis is logged. The record is retrievable for audit.

Failure mode

Approval leaves no trace. It is impossible to reconstruct who reviewed a specific output or what they considered.

Signal: Without a record, review accountability cannot be demonstrated after the fact.

06

What happens when a reviewer is unavailable?

Good practice

The workflow pauses or routes to a named deputy. Actions are not taken without review when the process requires it.

Failure mode

Work queues up and is bulk-approved on return. Or the system continues to act without any review.

Signal: If continuity depends on bypassing review, the control is not robust.

When to escalate to a different model

Human-in-the-Loop stops being adequate when the conditions for genuine review break down. These are the signals to consider a different governance model.

  • ! Volume has grown beyond reviewer capacity.
  • ! Average review time per item is below a realistic threshold.
  • ! Approve-all behaviour is observed across reviewers.
  • ! Reviewers cannot explain specific approval decisions.
  • ! The AI system is acting in areas reviewers cannot independently assess.
  • ! Actions are externally facing, irreversible, or create legal obligations.