Loop · Human-in-the-Loop
Making review genuine
Human-in-the-Loop is the right model when a human reviews individual outputs before action is taken. This guidance helps organisations check whether that review is real.
When Human-in-the-Loop fits
Human-in-the-Loop is appropriate when the AI generates outputs for a human to evaluate, and the human decides what to do with them. The key condition is that nothing happens until a person reviews and approves.
It works best at human-manageable volumes where reviewers have the expertise to evaluate what they are approving. It is the natural model for drafting, summarisation, legal research, and any assistant-style tool where judgement remains central.
Review health checklist
Is the volume manageable for genuine review?
Good practice
A reviewer can spend meaningful time on each item. Volume is monitored and escalated if it grows beyond capacity.
Failure mode
Items are approved in batches without individual examination. Volume has grown faster than review capacity.
Signal: Track average review time per item. If it falls below a realistic threshold, review is no longer genuine.
Does the reviewer have the knowledge to evaluate the output?
Good practice
The reviewer understands the subject matter, can identify errors, and knows when to escalate.
Failure mode
The reviewer approves outputs they cannot independently evaluate. The AI has more domain knowledge than the person reviewing it.
Signal: Ask: could this reviewer catch a plausible but wrong output? If not, review is not a control.
Is adequate time allocated for each review?
Good practice
Review is built into the workflow with realistic time per item. Reviewers are not expected to clear queues at speed.
Failure mode
Review is treated as a formality. Reviewers face pressure to approve quickly to keep the workflow moving.
Signal: If time-per-review is under pressure, the human is not in the loop in any meaningful sense.
Is there a process when the reviewer disagrees?
Good practice
Reviewers can reject, escalate or amend outputs. Disagreements are recorded. Patterns of rejection trigger review of the AI system.
Failure mode
There is no clear path for rejection. Reviewers approve outputs they are uncertain about because there is no alternative.
Signal: If rejection rates are near zero across all reviewers, the process may be performative rather than substantive.
Is the review decision recorded?
Good practice
Who reviewed what, when, and on what basis is logged. The record is retrievable for audit.
Failure mode
Approval leaves no trace. It is impossible to reconstruct who reviewed a specific output or what they considered.
Signal: Without a record, review accountability cannot be demonstrated after the fact.
What happens when a reviewer is unavailable?
Good practice
The workflow pauses or routes to a named deputy. Actions are not taken without review when the process requires it.
Failure mode
Work queues up and is bulk-approved on return. Or the system continues to act without any review.
Signal: If continuity depends on bypassing review, the control is not robust.
When to escalate to a different model
Human-in-the-Loop stops being adequate when the conditions for genuine review break down. These are the signals to consider a different governance model.
- ! Volume has grown beyond reviewer capacity.
- ! Average review time per item is below a realistic threshold.
- ! Approve-all behaviour is observed across reviewers.
- ! Reviewers cannot explain specific approval decisions.
- ! The AI system is acting in areas reviewers cannot independently assess.
- ! Actions are externally facing, irreversible, or create legal obligations.