2026 · EngineeringAbout 13 min readNovus Stream Solutions
The approver model: running a build pipeline where AI writes and you review
How a solo founder operates as the approver and admin of a build pipeline while AI agents do the execution — where that division of labor works, where it breaks, and the review discipline that keeps quality from sliding.
Overview
A solo founder building several apps with AI agents is not doing the same job as a solo founder writing all the code by hand. The role changes from author to approver: the agents do the execution — the research, the planning drafts, the code — and the human's job is to direct that execution and to be the final reviewer and decision-maker on everything that ships. This is a genuinely different operating model, with its own strengths and its own characteristic failure modes, and treating it like traditional solo development or like managing a team both miss what it actually is. This post is about how the approver model works in practice, where the division of labor is a real multiplier, where it breaks down, and the review discipline that keeps it from quietly degrading into rubber-stamping.
The core idea is a separation between who produces work and who is accountable for it. The agents produce; the human approves. That separation is what lets one person operate at a scale that hand-coding could not reach — but it only works if the approval is real, which is the entire challenge. An approver who actually reviews is a force multiplier; an approver who waves things through is just a slower way to ship whatever the agent happened to generate. Most of the discipline in this model is about staying the first kind of approver.
What the human keeps, what the agent takes
The division that works is to give the agent the execution and keep the judgment. The agent takes the things that are labor-intensive but well-defined once the approach is decided: drafting an implementation from a plan, doing the mechanical parts of a refactor, producing a first pass at research across a codebase, generating the boilerplate that surrounds the interesting decisions. The human keeps the things that are judgment calls: what to build and why, which approach to take among several, whether a finished piece of work is actually correct and consistent with the system, and — critically — the decision to ship. The agent can propose any of these, and a good agent's proposals are genuinely useful, but the human owns the decision rather than delegating it.
This split maps onto the three-mode workflow cleanly. The human drives the expansion and owns the plan; the agent executes the coding against that plan; the human reviews the result. The agent is doing the most time-consuming work, which is where the leverage comes from, while the human is making the decisions that determine whether that work is the right work, which is where the quality comes from. The mistake in both directions is to misplace the line: hand the agent the judgment (and get fast execution of poorly chosen approaches) or keep the execution (and lose the entire point of working with agents). The line goes between deciding and doing.
Where the model works
The approver model is at its strongest on well-scoped work with a clear definition of correct. When the task is "implement this reviewed plan" or "apply this established pattern across these files" or "draft the first version of this research," the agent's speed compounds and the human's review is a tractable check against a known standard. A solo operator working this way can move through a volume of execution that would be impossible by hand, because the time-consuming part is offloaded and the human's scarce attention is concentrated on the decisions and the review rather than spread across every keystroke. This is the regime where one person genuinely can run several apps: the agents absorb the execution load, and the human stays in the loop where their judgment is what matters.
It also works well as a forcing function for clarity. Because the agent acts on what it understands, the model rewards precise direction and punishes vague direction — which pushes the human toward the discipline of stating clearly what they want, what done looks like, and what constraints apply. That clarity is good for the work regardless of who or what is executing it. The approver model, run honestly, makes the human a better director because muddy direction produces visibly wrong output that the human then has to review and reject, creating immediate feedback on the quality of their own instructions.
Where the model breaks
The model breaks in two characteristic ways, and both are about the approval becoming hollow. The first is review fatigue: reviewing a large volume of generated work is genuinely tiring, and a tired reviewer starts skimming, trusting, and eventually rubber-stamping. The moment the approval stops being a real evaluation, the model loses its quality guarantee entirely and becomes a fast pipeline for shipping unreviewed work — which is worse than hand-coding, because at least hand-coding forces the author to understand every line. The defense is to keep the review tractable, which is exactly what the planning pass does: reviewing code against an approved plan is far less fatiguing than reviewing code cold, so the workflow that makes the agent effective is also the workflow that keeps the human's review sustainable.
The second failure is the agent confidently doing the wrong thing on work that was not well-scoped, and the human approving it because it looks plausible. Generated work has a dangerous property: it is usually well-formed and confident-looking even when the underlying approach is wrong, which makes a flawed approach easy to wave through if you are reviewing for "does this look like reasonable code" rather than "is this the right approach, correctly executed." This is why the judgment cannot be delegated: the human has to actually hold the standard of correctness and consistency, not just check for surface plausibility. The approver model is only as good as the approver's willingness to reject confident, plausible-looking work that is subtly wrong — which is a discipline, not a default.
Why plausible-but-wrong is the central hazard
The single most dangerous property of agent-generated work is that it is usually well-formed and confident-looking even when the underlying approach is wrong, and this is the hazard the approver model exists to guard against. Human-written work tends to look as rough as its underlying thinking — a confused approach often produces visibly confused code, which signals to a reviewer that something is off. Agent output does not carry that signal reliably: it can implement a wrong approach in clean, confident, well-structured code that looks exactly like correct work. The surface quality is decoupled from the correctness of the approach, which removes the rough-looks-wrong heuristic reviewers unconsciously rely on.
This is why reviewing for surface plausibility is precisely the wrong discipline for agent output, even though it is the natural one. If you review by asking "does this look like reasonable code," confident-looking wrong work passes, because it does look reasonable — that is exactly the trap. The approver has to review by asking "is this the right approach, correctly executed against what we actually need," which requires engaging with the substance rather than the surface. The decoupling of polish from correctness in generated work means the reviewer must supply the correctness judgment that the polish no longer signals. Internalizing that clean, confident output is not evidence of a correct approach — that it is, if anything, a reason for more scrutiny rather than less — is the central skill of being a good approver.
Keeping reviews small enough to be real
A practical defense against hollow approval is keeping the scope of any single review small enough to actually evaluate, because review quality collapses past a certain size. A reviewer faced with a small, focused change can genuinely understand and judge all of it; a reviewer faced with a huge change reviews the first part carefully, the middle with diminishing attention, and the end with a skim, because sustained rigorous review does not scale to arbitrary size. So a large change reviewed all at once is, in practice, a change that gets mostly skimmed, which means the approval is real for a fraction of it and rubber-stamped for the rest. The fix is to break work into pieces each small enough to be genuinely checked.
This is one reason the agent's speed must not translate directly into giant changes. The temptation, when an agent can produce a vast amount of work quickly, is to review it in vast chunks to keep up — but that is exactly how the review becomes hollow. Pacing the work into reviewable units, even though the agent could produce more at once, keeps the human's approval meaningful by keeping each review within the size where rigor is sustainable. The throughput of the system is then bounded by what the human can genuinely review, not by what the agent can generate, which is the correct constraint: the value of the approver model comes entirely from the approval being real, so the work must be sized to keep it real rather than sized to maximize generation.
How the model scales a solo operation
When the discipline holds, the approver model is what lets a single person operate at a scale that hand-coding never could, and understanding the mechanism clarifies both its power and its limits. The leverage comes from the human spending their time on the highest-value activity — judgment about what to build and whether it is right — while the agent absorbs the lower-value but time-consuming execution. A solo operator's scarcest resource is their own attention, and the model concentrates that attention on the decisions that determine quality while offloading the typing that determined nothing but took most of the time. That reallocation is the entire source of the scale advantage.
But the scaling is bounded precisely by the thing that makes it work: the human's capacity for genuine review. Because the value depends on the approval being real, and real approval takes real attention, the operation can only scale to the volume of work the human can genuinely evaluate, not to the volume the agent can generate. This is a healthy limit rather than a frustrating one, because it keeps the quality bar tied to actual human judgment rather than letting throughput outrun oversight. The approver model does not let one person ship unlimited work; it lets one person ship as much work as they can genuinely stand behind, which is far more than they could have typed but far less than the agent could have generated unsupervised. Operating within that real limit, rather than pretending it does not exist, is what makes the scaled output trustworthy.
Accountability cannot be delegated
A principle that anchors the whole approver model is that while execution can be delegated to an agent, accountability cannot — the human who ships the work is responsible for it regardless of who or what produced it. This is not a legalism but a working stance: the approver owns the output as if they had written it, which means they cannot hide behind "the agent did it" when something is wrong. That ownership is exactly what keeps the approval honest, because an approver who genuinely feels responsible for the work reviews it as work they are standing behind, not as work they are merely passing along. The accountability is what gives the review its weight.
This is the crucial difference between the approver model and simply trusting an agent's output. Trusting the output means treating the agent as accountable, which it cannot be — an agent does not bear the consequences of a bug reaching users, cannot be answerable for a wrong decision, and has no stake in the outcome. Only the human can hold that accountability, which is precisely why the human's approval is the thing that matters. The model works because a responsible human stands behind every shipped piece, having genuinely evaluated it; it fails the moment the human stops feeling that responsibility and starts treating approval as a formality. Keeping the sense of personal accountability for everything that ships, regardless of how it was produced, is the attitude that makes an approver an approver rather than a conduit, and it is the non-negotiable core beneath all the specific review disciplines.
The review discipline that keeps it honest
Keeping the approver model from degrading comes down to a few concrete disciplines. Review against a plan, not in a vacuum, so the question is the tractable "does this match what we decided" rather than the exhausting "is all of this right." Keep the scope of any single review small enough to actually evaluate — a giant change reviewed all at once is a change that gets skimmed, so the work is broken into pieces that can each be genuinely checked. Maintain hard checkpoints where human review is mandatory regardless of how routine the work looked, so that the approval is never quietly skipped on the work that seemed safe. And treat the build itself as a non-negotiable gate: the compiler and the build are tireless reviewers that catch a whole class of mistakes the human reviewer should not have to, which preserves human attention for the judgment calls only a human can make.
The throughline is that the approver model is a real operating model with real failure modes, not a magic trick that removes the need for engineering judgment. It relocates the judgment rather than eliminating it: the human does less typing and more deciding, which is a genuine multiplier when the deciding stays rigorous and a genuine hazard when it does not. Run with discipline, it is what lets a single person direct enough execution to maintain a portfolio of apps while keeping each one to a real quality standard. Run carelessly, it is an efficient way to ship things nobody actually evaluated. The difference is entirely in whether the approval is real, which is why the review discipline is the whole game. The companion post on guardrails covers exactly where the mandatory human checkpoints sit.