2026 · EngineeringAbout 13 min readNovus Stream Solutions
Guardrails and human review: where we let the agent run and where we don't
The concrete line we draw between autonomous agent work and mandatory human checkpoints — which work an agent can run on its own, which work always requires a human gate, and the principle that decides which is which.
Overview
Working with AI agents productively requires answering a question that does not have an obvious default: where do you let the agent run on its own, and where do you require a human to look before anything proceeds? Get it wrong in one direction and you bottleneck everything behind human review, losing the speed that made agents worth using. Get it wrong in the other direction and you let an agent take an action whose consequences you cannot easily undo, which is how AI-assisted development produces its worst outcomes. The answer is not "review everything" or "trust everything" — it is a deliberate map of autonomous zones and hard checkpoints, drawn by a principle rather than by mood. This post lays out where we draw that line and why.
The principle that decides the line is reversibility combined with blast radius: how easily can this action be undone, and how much can it affect if it is wrong. Work that is easily reversible and contained can run autonomously, because the cost of a mistake is low and recovery is cheap. Work that is hard to reverse or wide in its effects requires a human gate, because the cost of a mistake is high and there may be no clean undo. Almost every specific guardrail falls out of applying that one principle, which is what keeps the system coherent rather than a pile of ad hoc rules.
Where the agent runs autonomously
The agent operates on its own in the zones where mistakes are cheap and recoverable. Reading and exploring a codebase to build understanding is fully autonomous — it changes nothing, so there is nothing to undo, and letting the agent range widely across the code to map a problem is pure upside. Drafting work that a human will review before it goes anywhere — a proposed plan, a first-pass implementation on a branch, a research summary — runs autonomously, because the draft is not the decision; it is an input to a decision the human still makes. Mechanical changes with strong automated verification behind them, where the build and the type system will catch a mistake immediately, can run with a lighter touch, because there is a tireless reviewer underneath that does not get tired or skim.
The common thread in the autonomous zones is that a mistake is contained and undoable. An agent exploring code cannot break anything. An agent drafting on a branch cannot affect production. An agent making a change that the compiler checks cannot ship a type error. In all of these, the cost of letting the agent run and being wrong is low — you discard a draft, you fix a caught error, you revise an exploration — so the speed of autonomy is worth more than the safety of a checkpoint. Putting a mandatory human gate in front of these would be pure friction, slowing the work without meaningfully reducing risk.
Where a human checkpoint is mandatory
The hard gates sit in front of anything hard to reverse or wide in blast radius. Shipping to production is the canonical one: promoting a change to where real users meet it is a decision a human makes, every time, because it is the point where a mistake stops being contained. Anything that touches data in a way that cannot be cleanly undone — a destructive or irreversible operation — requires a human to look first, because "undo" may not exist and the agent's confidence is not a substitute for a human confirming that the destruction is actually intended. Changes with wide blast radius — something that touches many parts of the system at once, or alters a shared foundation many things depend on — get a human gate even when each individual change looks routine, because the interaction effects are exactly what an agent executing piece by piece is least likely to foresee.
The judgment that something is the right approach, as opposed to a working implementation of some approach, is also a mandatory human checkpoint, and it is the subtlest one. An agent can produce code that runs and still has chosen an approach that is wrong for the system — inconsistent with existing patterns, solving the wrong layer of the problem, introducing a structure that will cause pain later. None of that necessarily shows up as a failure the build catches; it shows up as debt. So the evaluation of approach, not just correctness, is a human gate: the human has to affirmatively judge that this is how the thing should be done, not merely that it does the thing. That checkpoint is where the consistency of the whole system is protected.
The build is a guardrail, not a formality
One of the most valuable guardrails is not a human checkpoint at all — it is the automated verification that runs on everything. The build passing, the type system finding no errors, the absence of broken routes: these are tireless reviewers that check every change without fatigue, and treating them as a hard gate rather than a formality is what lets the human review concentrate on the things automation cannot judge. A build that must pass before anything ships catches the entire class of mechanical mistakes — type errors, missing fields, broken references — so the human reviewer never has to spend attention on them and can spend it instead on approach and correctness. Automated gates and human gates are complementary: the automation handles what is checkable mechanically, freeing the human for what requires judgment.
This is also why investing in making the system catch its own mistakes is investing in the guardrails. The type-safe content model that turns a broken doc link into a compile error, the build that refuses to deploy a malformed page — these are guardrails that work at machine speed and never lapse, which makes them more reliable than any human discipline for the things they cover. The more the system can verify automatically, the more the scarce human review can focus where it actually matters, which is the judgment calls. Strong automated guardrails do not replace human review; they make human review sustainable by removing everything from it that did not need a human in the first place.
Reversibility as the organizing principle
The first axis that decides where a human gate belongs is reversibility: how easily can this action be undone if it turns out to be wrong. The logic is that the cost of a mistake is not just the mistake itself but the difficulty of recovering from it, and an action that is trivially reversible has a low total cost even if it goes wrong, because the recovery is cheap. An agent exploring a codebase, drafting on a branch, or making a change the build will catch can all be undone at near-zero cost, so the downside of letting the agent run and being wrong is bounded and small — which is precisely the condition under which autonomy is worth more than a checkpoint.
Irreversible actions invert this completely. When an action cannot be cleanly undone — a destructive data operation, a deployment that real users immediately depend on — the cost of a mistake includes the impossibility or expense of recovery, which makes even a small chance of error carry a large expected cost. The agent's confidence provides no protection here, because confidence is not the same as correctness, and there is no undo to fall back on if the confidence was misplaced. So reversibility alone sorts a great many actions: the easily-undone ones can run autonomously because mistakes are cheap, and the hard-to-undo ones require a human to look first because mistakes are permanent. Asking "if this is wrong, how hard is it to take back" is the first and often sufficient test for where a gate belongs.
Blast radius as the second axis
Reversibility is necessary but not sufficient, because some actions are individually reversible yet dangerous due to their reach, which is why blast radius is the second axis. Blast radius is how much a mistake can affect — how many parts of the system, how many users, how much downstream behavior depends on the thing being changed. A change to a shared foundation that many things rely on has a large blast radius even if the change itself could technically be reverted, because while it is wrong, it is wrong everywhere at once, and the interactions it disturbs may be hard to fully anticipate or untangle. Wide-reaching changes get a human gate even when each individual edit looks routine, because the risk is in the aggregate effect rather than any single edit.
Combining the two axes gives a clean map. Low-reversibility-cost and small-blast-radius actions run autonomously; high-reversibility-cost or large-blast-radius actions get a human gate. The two axes catch different dangers — reversibility catches the permanent mistake, blast radius catches the widespread one — and an action that is risky on either axis warrants a checkpoint. This is why the principle is reversibility combined with blast radius rather than either alone: a destructive operation is gated for irreversibility, a foundational change is gated for reach, and an action that is both contained and reversible is allowed to run because it is dangerous on neither axis. Almost every specific guardrail falls out of plotting actions on these two axes, which is what keeps the system of gates coherent rather than a pile of unrelated rules.
Why automated gates make the human ones sustainable
A crucial part of the guardrail system is that not every gate is a human one — the automated checks that run on everything are tireless reviewers that handle a whole class of verification, and they are what keep the human gates sustainable. The build passing, the type system finding no errors, the absence of broken references: these verify mechanically-checkable correctness on every single change without fatigue, which means the human reviewer never has to spend their limited attention on the things automation already guarantees. Every category of mistake the build catches is a category the human can stop checking for, concentrating their scarce judgment on the things only a human can evaluate.
This complementarity is what makes the whole system scale. Human review is the expensive, limited resource; automated verification is cheap and unlimited. Pushing everything that can be checked mechanically onto the automated gates — and investing in making the system catch more of its own mistakes, as the type-safe content model does by turning broken references into compile errors — frees the human gates to focus entirely on approach and judgment, which is where human review is irreplaceable. Automated and human gates are not alternatives but layers: the automation forms a floor that catches mechanical errors tirelessly, and the human review sits above it handling the judgment calls. The more capable the automated floor, the more the precious human attention concentrates where it actually matters, which is what lets a small operation maintain quality without the review becoming an unsustainable burden.
The autonomous zones, examined
It is worth examining the autonomous zones more closely, because the freedom granted there is what produces most of the speed, and understanding why each is safe builds confidence in granting it. Reading and exploring code is fully autonomous because it changes nothing — an agent ranging across the codebase to map a problem cannot break anything, so there is pure upside in letting it investigate widely and no reason to gate it. Drafting work destined for human review runs autonomously because a draft is an input to a decision, not the decision itself — a proposed plan or a first-pass implementation on a branch affects nothing until a human acts on it, so the agent can produce it freely. Mechanical changes with strong automated verification behind them run with a light touch because the build is standing underneath to catch errors immediately.
The common property is containment: in every autonomous zone, a mistake stays contained and is undone cheaply. This is what makes the autonomy not just acceptable but correct — gating these would be pure friction, adding human checkpoints to actions whose mistakes are already cheap and reversible, which slows the work without meaningfully reducing risk. The art of the guardrail map is recognizing that the autonomous zones deserve genuine autonomy, not grudging supervision, because over-gating the safe work is as much a failure as under-gating the dangerous work. Granting real freedom where mistakes are cheap is what captures the speed of agents; the discipline is reserving the checkpoints for where they actually earn their friction, and trusting the autonomy where the math says trust is cheap.
The goal is precise trust, not more or less
The most common framing of working with AI agents — how much should you trust them — is the wrong question, and the guardrail map embodies a better one: not how much to trust, but where to trust. Trust framed as a single dial, turned up toward "let the agent do everything" or down toward "review everything," is crude in both directions: turned up it is reckless, turned down it discards the speed that made agents worth using. The guardrail map replaces the dial with a map, trusting the agent fully in the zones where mistakes are cheap and reversible, and withholding trust precisely in the zones where they are expensive and permanent. Trust becomes a function of the situation rather than a global setting.
This precision is the actual craft of AI-assisted engineering. It is not about being more trusting or more cautious as a temperament; it is about discriminating accurately between the situations that warrant autonomy and the situations that warrant a checkpoint, and applying each in its right place. An engineer who trusts the agent everywhere and one who reviews everything are making the same mistake from opposite directions — treating trust as uniform when it should be situational. Drawing the guardrail map by reversibility and blast radius is what makes the trust precise: maximal where the math says mistakes are cheap, withheld where the math says they are costly. Getting that discrimination right is what captures the agent's speed on the bulk of the work while keeping firm human control over the narrow set of decisions where being wrong actually matters, which is the whole balance the discipline is trying to strike.
Why the line has to be drawn on purpose
The reason to make this map explicit rather than deciding case by case is that the case-by-case version drifts toward whichever pole is more comfortable in the moment. Under time pressure, ad hoc judgment drifts toward letting the agent run, because a checkpoint feels like friction when you are moving fast — and that drift is exactly how an irreversible mistake eventually gets made, on the day the checkpoint that would have caught it got skipped because it had felt unnecessary the last ten times. A pre-drawn line, decided by the reversibility-and-blast-radius principle when you are calm, holds under pressure precisely because it was not decided under pressure. The hard gates are hard because they were designated hard in advance, not because they felt important in the moment.
The deeper point is that good AI-assisted engineering is not about how much you let the agent do or how much you review; it is about putting the autonomy and the checkpoints in the right places. Maximal autonomy is reckless and maximal review is pointless; the value is in the discrimination between them. Drawing that line by a clear principle — run autonomously where mistakes are cheap and reversible, gate where they are expensive and permanent — is what lets you get the speed of agents on most of the work while keeping a human firmly in control of the small set of decisions where being wrong actually matters. That is the whole craft of it: not trusting agents more or less, but trusting them precisely where trust is cheap and withholding it precisely where it is not. The field-notes companion on AI workflow guardrails takes a complementary angle on the same discipline.