2026 · EngineeringAbout 12 min readNovus Stream Solutions

Our three-mode workflow: prompt expansion → planning → coding (and why we never skip the middle)

The actual process we use to ship software with AI agents: expand the request, plan the implementation, then code — with the planning pass treated as non-negotiable. What skipping the middle step costs, in concrete terms.

Pin it

Documentation

Contents

1.Overview
2.Mode one: prompt expansion
3.Mode two: planning (the step we never skip)
4.Mode three: coding
5.Code generation got easy; thinking stayed hard
6.The plan is the artifact you actually review
7.The modes map to the cost of change
8.When the modes can be compressed
9.Producing maintainable code, not just working code
10.What skipping the middle actually costs

Overview

Shipping software with AI agents is easy to do badly and not obvious to do well. The failure mode is seductive: you describe what you want, the agent writes code, the code runs, and you ship it — fast, satisfying, and a reliable way to accumulate a codebase nobody fully understands and that breaks in ways nobody anticipated. The process we use at Novus to avoid that is deliberately three modes, in order: expand the request into a precise understanding, plan the implementation before any code is written, and only then code. The middle step — planning — is the one most teams skip under time pressure, and it is the one we treat as non-negotiable. This post is about why that order matters and what skipping the planning pass actually costs.

The short version of the argument is that the quality of AI-assisted code is determined far more by the clarity of the understanding and the plan that precede it than by the code generation itself. Code generation is the part that has become easy; understanding the problem precisely and deciding on an approach are the parts that were always hard, and they did not stop being hard because an agent can type fast. The three-mode workflow exists to force the hard thinking to happen before the easy typing, which is the opposite of what happens when you let the agent jump straight to code.

Mode one: prompt expansion

The first mode is expanding a request into a precise statement of what is actually being asked. A real request is almost always underspecified — "add a batch export" carries a dozen unstated assumptions about formats, limits, error handling, and how it fits the existing system, and if those assumptions are not surfaced, the agent will fill them in with guesses that may not match what you meant. Prompt expansion is the work of turning a loose ask into a clear one: what exactly is the desired behavior, what are the edge cases, what are the constraints, what does done look like. It is the same discipline a good engineer applies to a vague ticket before writing anything, and it matters more with an agent, not less, because the agent will confidently act on whatever interpretation it lands on.

Doing this as an explicit mode rather than skipping straight to code prevents the most common and most expensive category of AI-assisted mistake: the agent builds exactly what it understood, which is not what you wanted, and you do not find out until you are reviewing the finished implementation. Catching the misunderstanding at the expansion stage costs a sentence of clarification; catching it after the code is written costs a rewrite. The expansion mode is cheap insurance against building the wrong thing well.

Mode two: planning (the step we never skip)

The second mode is planning the implementation before writing it, and it is the heart of the whole workflow. A plan is a decision about approach: which files change, what the shape of the solution is, what existing patterns it should follow, what it should reuse rather than reinvent, where the risks are. Producing the plan separately from the code forces the architectural thinking to happen at the level where it is cheap to change — you can revise a plan in seconds, whereas revising an implementation means rewriting code. The plan is also the artifact you review most carefully, because a flaw in the plan is a flaw that will be faithfully built into the code, and catching it in the plan is enormously cheaper than catching it in the implementation.

The reason this step is the one most often skipped is that it feels like a detour. The agent can start producing code immediately, and planning first feels like delaying the "real" work. That instinct is exactly backwards, and it is the single most expensive instinct in AI-assisted development. When you skip planning, the agent makes all the architectural decisions implicitly, in the act of coding, with no chance for you to evaluate them before they are baked in. You end up reviewing a finished implementation that has already committed to an approach, where your only options are to accept whatever structure it chose or to ask for a rewrite. The planning pass converts those implicit, hard-to-change decisions into explicit, easy-to-change ones, which is the difference between directing the work and merely receiving it.

Skipping planning bakes architectural decisions into code; planning first keeps them cheap to change — Skip planning and the agent decides the architecture implicitly, in code. Plan first and the decisions stay cheap to change.

Mode three: coding

The third mode is the one everyone thinks of as the work, and by the time you reach it, the hard parts are largely done. With a precise understanding and an approved plan in hand, code generation becomes execution of a decided approach rather than a series of unsupervised judgment calls. The agent is implementing a plan you reviewed, against an understanding you confirmed, which means the code is far more likely to be right the first time and far easier to review, because you are checking whether it matches the plan rather than reverse-engineering what approach it silently chose. The coding mode is where the agent's speed genuinely pays off — but it pays off precisely because the two modes before it constrained what the agent is fast at.

This ordering also makes the review at the coding stage tractable. Reviewing AI-generated code with no plan to check it against is exhausting and unreliable, because you are simultaneously trying to understand what it did, why it did it, and whether that was the right call — three questions at once, on code you did not write. Reviewing code against a plan you already approved collapses that to one question: does this implement the plan correctly? The earlier modes are what make the final review a focused check rather than an open-ended archaeology project.

Code generation got easy; thinking stayed hard

The whole rationale for the three-mode workflow rests on a shift worth naming explicitly: code generation has become easy, while understanding the problem and deciding on an approach have stayed exactly as hard as they always were. For most of software history, the act of writing code was itself a significant bottleneck — it took time and skill to translate an idea into working syntax — so the thinking and the typing were entangled and the typing was a real share of the effort. AI agents have collapsed the typing cost, which is genuinely transformative, but it has also created a trap: because the visibly hard part got easy, it is tempting to assume the whole thing got easy, when the parts that were always hard are untouched.

Recognizing this is what justifies front-loading the thinking. If code generation is the cheap part and understanding-and-deciding is the expensive part, then the worst possible move is to rush past the expensive part to get to the cheap one — yet that is exactly what skipping straight to code does. The three modes exist to put the effort where the difficulty actually is now: most of the care goes into expansion and planning, where the hard, irreducible thinking lives, and the coding mode is allowed to be fast because the hard decisions were already made. The workflow is a direct response to the new economics of building, which reward investing in the thinking precisely because the typing no longer charges for itself.

The plan is the artifact you actually review

A practical reason the planning mode is non-negotiable is that the plan, not the code, is the artifact where review is most effective, and producing it explicitly is what makes that review possible. A plan states the approach at a level a human can evaluate quickly: which files change, what the shape of the solution is, what it reuses, where the risks are. Reading and judging a plan takes minutes and surfaces approach-level problems — wrong layer, inconsistent with existing patterns, missing a case — while they are still cheap to fix. The same problems, embedded in a finished implementation, take far longer to spot and far longer to correct, because you are reverse-engineering the approach from the code rather than reading it stated plainly.

This makes the plan the highest-leverage review point in the whole workflow. A flaw caught in the plan costs a revised sentence; the identical flaw caught in the code costs a rewrite; the same flaw not caught at all costs a latent structural problem that surfaces later. By producing the plan as a distinct, reviewable artifact, the workflow concentrates the most consequential review where it is cheapest and most effective, rather than deferring all review to the implementation where it is most expensive. Skipping the planning mode does not just skip planning — it removes the artifact that makes meaningful review possible before code is committed to an approach, which is why its absence is felt as a degradation in everything downstream.

The modes map to the cost of change

The deepest logic of the three-mode order is that it moves decisions to the point where they are cheapest to change, which is a general principle of good engineering applied to the AI-assisted workflow. A decision made during expansion — what are we actually building — costs a clarifying sentence to revise. A decision made during planning — what approach will we take — costs a plan edit to revise. A decision made during coding — implicitly, by how the code was written — costs a code rewrite to revise. The cost of changing a decision rises sharply as you move from understanding to planning to implementation, so the workflow is structured to make the decisions in the order that keeps each one as cheap to revise as possible.

Skipping modes collapses decisions into the most expensive place to make them. When you jump straight to code, the what-are-we-building decision and the what-approach decision both get made implicitly during coding, at the highest cost-of-change, with no cheap checkpoint where they could have been caught and revised. The three modes are essentially a cost-of-change optimization: surface and settle each kind of decision at the stage where revising it is cheapest, so that by the time you reach the expensive-to-change implementation, the decisions that would be expensive to revise there have already been made and reviewed where they were cheap. Understanding the workflow as cost-of-change management, rather than as bureaucratic stages, is what makes its order feel inevitable rather than arbitrary.

When the modes can be compressed

Intellectual honesty requires noting that the three modes are not always three separate, heavyweight phases — for small, well-understood changes, they compress, and pretending every trivial edit needs a formal research sprint would be its own kind of waste. A one-line fix to a clear bug does not need an elaborate expansion or a written plan; the understanding and the approach are obvious, so the modes collapse into a quick mental pass and the edit. The workflow is a tool for managing the risk and cost that scale with the size and uncertainty of a change, not a ritual to be performed identically regardless of stakes.

The skill is in calibrating how much of each mode a given change warrants. A small, contained, well-understood change can compress the modes nearly to nothing; a large, uncertain, multi-file change warrants each mode as a substantial, explicit phase. The mistake is not compressing the modes for small work — that is correct — but failing to expand them for large work, which is where the cost of skipping planning is real. The rule of thumb is that the modes should be as explicit as the change is large and uncertain: trivial changes get a light touch, consequential changes get the full discipline. Knowing where a given task sits on that spectrum, and applying the appropriate weight of process, is part of using the workflow well rather than mechanically.

Producing maintainable code, not just working code

The deepest payoff of the workflow is that it produces code that is maintainable, not merely code that works, and the distinction is the whole point. Working code is easy to get from an agent — describe a task and you will usually receive something that runs. Maintainable code is harder: it has to be consistent with the patterns the rest of the system uses, structured so future changes are easy, and free of the hidden inconsistencies that make a codebase progressively harder to work in. The difference between the two is decided almost entirely in the understanding and planning that precede the code, because maintainability is a property of the approach, and the approach is what the planning pass exists to get right.

This is why "it runs" is a dangerously low bar for AI-assisted work. An agent jumping straight to code can clear that bar easily while producing something that solves the immediate task in isolation, inconsistent with everything around it — code that works today and makes every future change a little harder. The three-mode workflow aims past working at maintainable by forcing the approach decisions, where maintainability is determined, to be made and reviewed deliberately rather than fallen into during coding. The result is code that not only runs but fits, which is the only kind of code worth accumulating, because a codebase is a long-lived asset and every piece added to it either keeps it coherent or erodes it. Optimizing for maintainable over merely working is what keeps the codebase an asset rather than a growing liability.

What skipping the middle actually costs

The cost of skipping the planning pass is not usually a dramatic failure; it is a slow accumulation of structural debt that surfaces later as disproportionate pain. Code written without a reviewed plan tends to solve the immediate request in isolation, without regard for the patterns the rest of the system uses, which produces a codebase where every feature works but no two features work the same way. That inconsistency is invisible at first and then becomes the reason a small change touches more code than it should, the reason a bug class hides in several slightly different implementations, and the reason onboarding to your own codebase gets harder over time. The planning pass is where consistency with the existing system gets decided, and skipping it is how a codebase loses coherence one reasonable-looking feature at a time.

We learned the value of this concretely from the kinds of problems that required real rebuilds in the background remover at bgremover.novusstreamsolutions.com/how-it-works — situations where a class of bug had been built into several places because each was implemented independently without a unifying plan, and fixing it meant reconciling divergent implementations after the fact. The planning pass is the cheap, upfront version of the work that those rebuilds did expensively in hindsight: deciding once how something should be structured, so it is structured that way everywhere, instead of discovering after the fact that it was structured five different ways. That is why the middle mode is non-negotiable. It is not bureaucracy slowing down the coding; it is the step that determines whether the coding produces something coherent or something that will need to be rebuilt later. The companion posts cover the review model that runs across all three modes and how the planning pass scales into a full research sprint for large problems.

Frequently asked questions

Quick answers to common questions about this topic.

What is the prompt → plan → code workflow?

A three-stage process: expand a rough request into a clear spec, produce an explicit implementation plan, then write code against that plan. Each stage de-risks the next, so the coding step starts from agreed intent rather than guesswork.

Why not let an AI agent jump straight to code?

Skipping planning is where AI builds drift — the agent makes architectural choices implicitly and you discover them after the fact. A reviewed plan catches misunderstandings while they are cheap to fix.

Does this slow development down?

It adds a short planning step but usually nets faster, because it prevents large rewrites. The middle stage is the cheapest place to catch a wrong assumption.