2026 · EngineeringAbout 13 min readNovus Stream Solutions

Running a multi-agent research sprint before touching code

Before a large refactor, we map the problem with parallel investigation agents rather than diving into code. How structuring research as its own sprint — and why research-first beat patch-first on a big multi-file change.

Pin it

Documentation

Contents

1.Overview
2.Why research-first beats patch-first on large changes
3.Structuring the research as parallel investigation
4.The cost asymmetry that justifies research
5.Bounded questions beat open-ended surveys
6.Synthesizing parallel findings into one map
7.Research is the planning pass, scaled up
8.Knowing when to skip the sprint
9.Research as a reusable, durable asset
10.When the research changes the plan entirely

Overview

There is a strong temptation, faced with a large change, to start changing things. The problem is visible, the first few edits are obvious, and beginning feels like progress. On a small change that instinct is fine. On a large, multi-file refactor it is how you end up halfway through, discovering that the thing you assumed about the system three files ago is wrong, and now you have a pile of partial edits built on a misunderstanding. The alternative we use is to treat understanding the problem as its own phase — a research sprint — that completes before any implementation begins, and to run that research with multiple agents investigating in parallel. This post is about how that works and why research-first consistently beats patch-first on big changes.

The core move is to separate two activities that usually get tangled together: figuring out how the system actually works, and changing it. When those happen at once — investigating as you edit — you make decisions on partial information and discover constraints after you have already committed to an approach. Separating them means you finish learning the territory before you start moving through it, so that by the time you write code, you are executing against a complete map rather than exploring and building simultaneously.

Why research-first beats patch-first on large changes

Patch-first works by local reasoning: look at the thing that needs to change, change it, look at the next thing. For a contained change, local reasoning is sufficient and efficient. For a change that touches many parts of a system, local reasoning systematically misses the interactions — the place three modules over that depends on the behavior you are about to alter, the second implementation of the same pattern that also needs the fix, the assumption baked into a distant file that your change invalidates. You do not see these from where you are editing, so patch-first discovers them the hard way: as breakage, after the change, when the cost of having been wrong is highest. The bigger the change, the more of these hidden interactions there are, and the worse patch-first scales.

Research-first inverts the order so the interactions are found before any code is committed to them. By mapping the whole affected area first — every place the pattern appears, every dependency on the behavior in question, every assumption that the change will touch — you turn the hidden interactions into known facts before they can become surprises. The change you then plan accounts for the full picture rather than the local view, which means it is far more likely to be right the first time and far less likely to spawn a cascade of follow-up fixes for things the local view could not see. The research is not overhead on top of the real work; it is what makes the real work converge instead of thrash.

Structuring the research as parallel investigation

A large problem usually has several distinct facets that can be investigated independently, and that independence is what makes parallel agents the right tool. Rather than one investigator working through the whole problem serially, the research is split into focused questions — how is this pattern currently implemented across the codebase, what depends on this behavior, how do the existing tests cover this area, what are the existing conventions this change should follow — and separate agents pursue each one at the same time. Each agent goes deep on its slice without being distracted by the others, and the results come back together to form the complete map. The parallelism is not just faster; it produces better coverage, because each investigation is focused rather than diluted across the whole problem at once.

The discipline that makes this work is giving each agent a specific, bounded charge rather than a vague "look into this." A focused investigation question — find every place this pattern occurs, trace what reads this piece of state, identify the conventions in this area — produces a usable answer, while an open-ended one produces a wandering survey. The research sprint is structured as a set of these specific questions whose answers, combined, constitute understanding the problem completely. When the agents report back, the result is a synthesized picture of the territory that the subsequent plan and implementation are built on, with the confidence that the map is complete because the investigation deliberately covered every facet rather than the parts that happened to be near where someone started editing.

Several focused investigation agents each answering one bounded question, synthesized into one map — Each agent gets a specific, bounded question. Their answers combine into a complete map before any code is written.

The cost asymmetry that justifies research

The case for front-loading research rests on a cost asymmetry: the cost of research is roughly fixed, while the cost of discovering the same facts later through breakage scales with the size of the change. Investigating a problem thoroughly takes a certain amount of effort regardless of when you do it; discovering a constraint or a hidden interaction after you have built on the wrong assumption costs a rewrite whose size grows with how much you built. On a small change there is not much to rewrite, so the asymmetry is mild and patch-first is fine. On a large change there is a great deal to rewrite, so discovering a wrong assumption late is enormously expensive, and the fixed cost of research up front becomes a bargain by comparison.

This asymmetry is why research-first is specifically a large-change discipline rather than a universal one. The research does not pay for itself on small, contained work, where local reasoning is sufficient and the cost of being wrong is small — there, the fixed cost of a research sprint would be overhead. It pays for itself precisely on large, multi-file changes, where the cost of building on a wrong understanding is high and the research that prevents it is cheap relative to the rework it avoids. Knowing that the justification is the cost asymmetry, and that the asymmetry scales with change size, is what lets you apply research-first where it earns its keep and skip it where it would be waste. Research is not virtuous in itself; it is a hedge whose value rises with the cost of the rework it prevents.

Bounded questions beat open-ended surveys

The discipline that makes a research sprint productive rather than a wandering exploration is giving each investigation a specific, bounded charge rather than a vague directive. "Look into the queue system" produces a meandering survey that may or may not surface what matters; "find every place this lifecycle pattern is implemented" produces a usable, complete answer to a question that actually feeds the plan. A bounded question has a definite scope and a definite notion of being answered, which lets an investigation go deep and come back with something concrete, whereas an open-ended one has no natural completion and tends to produce breadth without the depth the plan needs.

This is especially important when multiple agents investigate in parallel, because parallelism only helps if each agent has a focused slice it can pursue independently. Splitting the research into bounded questions — how is this implemented across the codebase, what depends on this behavior, what do the tests cover, what conventions apply here — gives each agent a self-contained charge it can answer thoroughly without coordinating with the others, and the answers combine into the complete map. Vague charges, by contrast, produce overlapping, diluted surveys that neither cover the problem completely nor combine cleanly. The quality of a research sprint is largely determined by the quality of the questions it is decomposed into: specific and bounded produces a usable map, vague and open-ended produces a pile of partial impressions. Framing the right questions is the real skill of structuring research well.

Synthesizing parallel findings into one map

Parallel investigation produces several focused answers, and there is a distinct step of synthesizing them into a single coherent understanding that the plan is built on — the research is not done when the agents report back, but when their findings have been combined into one map. Each agent answers its bounded question, but the value is in how the answers fit together: the places the pattern occurs, plus what depends on the behavior, plus the conventions that apply, together constitute an understanding that no single investigation produced alone. Synthesis is where the parallel findings stop being separate facts and become a model of the problem complete enough to plan against with confidence.

This synthesis step is where the human judgment in the research phase concentrates, and it is worth treating as its own deliberate activity rather than assuming the findings assemble themselves. Reconciling the parallel investigations means noticing where they connect, where one agent's finding changes the significance of another's, and whether together they actually cover the problem or leave a gap that needs another investigation. The output of a good research sprint is not a stack of reports but a synthesized picture — a clear statement of how the system actually works in the affected area, complete enough that the plan built on it accounts for the full reality rather than a partial view. The parallelism produces the raw material fast; the synthesis turns it into the durable map that the planning and implementation then rely on, which is what makes the whole research-first approach pay off.

Research is the planning pass, scaled up

It clarifies the research sprint to see it as the same activity as the planning pass in the three-mode workflow, scaled up for a problem too large to understand in a single quick pass. On a small change, understanding the problem and deciding the approach happen in a brief mental planning step; on a large change, that same understanding requires a dedicated, structured, possibly parallel investigation before the approach can be decided. The research sprint is not a different kind of work from planning — it is planning's understanding phase grown to match a problem whose comprehension does not fit into a quick mental pass. The continuity matters because it means research-first is not an exotic technique but the familiar plan-before-you-code discipline applied at the scale where understanding itself becomes substantial work.

This framing also explains where the boundary between a quick plan and a full research sprint falls: it is exactly where understanding the problem stops being something you can hold in a single pass and becomes something that requires deliberate, structured investigation to assemble. Small problems get a quick planning step because their understanding is immediate; large problems get a research sprint because their understanding has to be built. The same principle — do not commit to an approach until you understand the problem — drives both, with the scale of the understanding effort matched to the scale of the problem. Recognizing the research sprint as the planning pass grown large keeps the whole workflow coherent: it is always plan before code, with the planning expanding into real research precisely when the problem is large enough to demand it.

Knowing when to skip the sprint

A research sprint is a tool with a cost, and using it indiscriminately would be as much a mistake as never using it, so part of the discipline is recognizing when a change does not warrant one. Small, contained changes where local reasoning is sufficient — a fix in one clear place, a feature that touches a single well-understood area — do not need a structured investigation, because there are no hidden interactions across a large surface for the research to uncover. For those, front-loading a research sprint would be pure overhead, spending fixed investigation cost on a problem whose understanding was already immediate. The sprint earns its keep on large, multi-file, uncertain changes, and applying it to small certain ones is a misallocation.

The signal that a change warrants a research sprint is that its scope is large enough, or its interactions uncertain enough, that local reasoning would systematically miss things — that the cost of building on a wrong understanding is high and the extent of the problem is not already clear. When you can see the whole change from where you stand, you do not need to map it first; when the change reaches into parts of the system you cannot see from the starting point, you do. Calibrating this correctly — full research sprint for the large and uncertain, quick mental pass for the small and clear — is what keeps the technique a sharp tool rather than a ritual. The goal is to match the investigation effort to the actual risk and scope of the change, neither under-investigating large work nor over-investigating small work.

Research as a reusable, durable asset

A benefit of treating research as its own structured phase is that the understanding it produces becomes a durable asset, not a fleeting state that evaporates once the change ships. The synthesized map of how a system actually works in some area — every place a pattern occurs, what depends on a behavior, the conventions in play — is valuable beyond the single change that prompted it, because the next change in the same area can build on it rather than rediscovering it. When research is done as a deliberate phase and its output captured, the investment in understanding pays dividends across multiple future changes, not just the one at hand.

This reusability strengthens the case for research-first beyond the immediate cost-benefit on a single change. Understanding a system is expensive to acquire and easy to lose if it lives only in the transient context of one task, but captured as a durable artifact it compounds: each research effort adds to a growing, reusable understanding of the codebase that makes subsequent work in that area faster and safer. The map produced for a large refactor is exactly the kind of artifact that anchors the decomposed units of that refactor and informs the next change nearby. Treating research output as a first-class, durable deliverable rather than a means to an end is what lets the understanding accumulate, which is yet another way the front-loaded investigation earns more than its immediate cost — it builds an asset that keeps paying off.

When the research changes the plan entirely

The clearest sign that research-first is earning its keep is how often the research changes the plan you would have made without it. Going in, you have a hypothesis about what the change involves; the research routinely reveals that the real shape of the problem is different — that the bug you were going to patch in one place actually exists in several, that the feature you were going to add interacts with a system you had not considered, that the clean approach you imagined collides with a constraint you did not know about. Each of those discoveries, made during research, costs a revision to a plan. Made during implementation, the same discoveries cost a rewrite of code, or worse, ship as a half-correct change that handles the case you saw and misses the cases you did not. The research is where the plan gets to be wrong cheaply.

This is the same logic as the planning pass in the three-mode workflow, scaled up: the bigger the change, the more it pays to do the understanding completely and separately before committing to an approach, because the cost of a wrong approach scales with the size of the change while the cost of research stays roughly fixed. On a large refactor — like the queue and worker overhaul behind bgremover.novusstreamsolutions.com/how-it-works — a research sprint that takes real time up front routinely saves far more than it costs by ensuring the implementation is built on an accurate model of the system rather than a hopeful one. Research-first beats patch-first not because investigating is virtuous but because, on large changes, the alternative is discovering the same facts later at much higher cost. The companion posts cover the context-window constraints that make front-loaded research even more valuable and the all-tools pattern the research so often uncovers.

Frequently asked questions

Quick answers to common questions about this topic.

What is a multi-agent research sprint?

Before writing code, several agents explore the codebase and problem in parallel — one mapping existing patterns, another checking related components, a third on testing — so implementation begins from a shared, evidence-based understanding.

Why research before coding with AI?

It prevents the most expensive mistakes: building something that already exists, or that conflicts with an existing pattern. Cheap parallel exploration up front beats discovering the conflict mid-implementation.

When is a research sprint worth it?

When scope is uncertain or the change touches multiple areas. For a small, isolated edit it is overkill; for a large refactor it is what keeps the work coherent.