2026 · EngineeringAbout 13 min readNovus Stream Solutions

Managing the context window on a large refactor: what broke and how we fixed our sessions

The real constraint hit during a big multi-file overhaul of the background remover's queue and worker system: the working context could not hold the whole problem at once. The symptoms, and the workflow change that resolved them.

Pin it

See the rebuilt pipeline Documentation

Contents

1.Overview
2.The symptom: the problem stops fitting
3.The fix: decompose into self-contained units
4.A workflow problem, not a tooling complaint
5.Self-contained units, defined precisely
6.The pagination analogy
7.Why this discipline applies to humans too
8.Artifacts are the memory the session lacks
9.The plan is the spine of a large change
10.Durable artifacts over live memory

Overview

A specific, practical constraint shows up on any large AI-assisted refactor, and it is worth being concrete about because the fix is a workflow change rather than a tooling wish. The constraint is that the working context — everything the session is actively holding about the problem at a given moment — has a limit, and a sufficiently large refactor does not fit inside it. The overhaul of the background remover's queue and worker system behind bgremover.novusstreamsolutions.com/how-it-works was exactly this kind of change: it touched several divergent queue implementations and the shared model-session lifecycle across multiple tools, and the full problem — every queue, every dependency, every place the pattern lived — was simply more than could be held coherently in working context all at once. This post is about the symptoms of hitting that limit and the workflow change that resolved it. It is a lesson about how to structure work, not a complaint about model limits.

The honest framing matters here because this topic invites the wrong conclusion. The lesson is not "wait for bigger context windows"; bigger windows help but the underlying discipline applies at any size, because there is always a problem large enough to exceed whatever the current limit is. The durable lesson is about decomposing work and externalizing state so that no single step needs to hold the entire problem at once — which is good engineering practice independent of any tool, and which a large refactor forces you to learn whether you wanted to or not.

The symptom: the problem stops fitting

The symptom of exceeding working context is subtle and corrosive rather than a hard error. As the refactor grew, keeping the whole picture active — what every queue did, how they differed, what the unifying design was, what had already been changed and what had not — started to degrade. Decisions made early in the work were no longer fully present by the time later, related decisions had to be made, so consistency began to slip: a choice made for one queue might not be perfectly mirrored in the next, not because the choice was wrong but because the full context of the earlier choice was no longer in active view. The work did not fail loudly; it got harder to keep coherent, which on a refactor whose entire goal was consistency is precisely the failure that matters.

This is the multi-file refactor's version of a very general problem: the larger the thing you are trying to hold in your head — or in any bounded working memory — the more the edges fall out of view, and the more the work drifts from the parts you can no longer see. A human engineer hits the same wall on a big enough change, which is why the solution is a workflow that does not depend on holding everything at once, rather than a heroic effort to hold more.

The fix: decompose into self-contained units

The workflow change that resolved it was to restructure the refactor as a sequence of self-contained units of work, each small enough to be held completely in context on its own, with a stable definition of what it covers and what "done" means for it. Instead of one sprawling change that required the whole problem in view, the work became a series of bounded steps — bring this one queue to the canonical model, then the next, each step a complete, reviewable piece that did not require the others to be simultaneously active in working context to execute correctly. The unifying design lived in a durable artifact that each step referenced, rather than being something that had to be continuously held in memory across the entire refactor.

The key property of a good unit here is that it is genuinely self-contained: executing it correctly requires understanding that unit and the shared plan it implements, but not the live details of every other unit. That is what keeps each step inside the working-context limit no matter how large the overall refactor is — you are never asking a single step to hold the whole thing, only its own slice plus the stable map. The decomposition is doing the same job that pagination does for a document too large to load at once: you do not need the whole thing in memory simultaneously if you have a reliable structure for working through it piece by piece.

A refactor too large for working context decomposed into self-contained units sharing one durable plan — No single step holds the whole refactor — each holds its own slice plus a durable, written plan every step conforms to.

A workflow problem, not a tooling complaint

It is important to frame the context-window limit correctly, because the wrong framing leads to the wrong response. The wrong framing is to treat it as a tooling limitation to be waited out — to assume the answer is simply larger context windows and that the problem disappears once they arrive. Larger windows do help, but they do not dissolve the underlying issue, because there is always a problem large enough to exceed whatever the current limit is. A refactor twice the size of the one that overflowed today's window will overflow tomorrow's larger one. The constraint is not a fixed obstacle that a bigger tool removes; it is a recurring reality that scales with the ambition of the work, which means the response has to be a durable discipline rather than a wait for better hardware.

The right framing is that this is a workflow problem with a workflow solution, and the solution is good engineering practice independent of any tool. The same constraint — a problem too large to hold entirely in working memory at once — has always applied to human engineers on big enough changes, which is why the disciplines that address it, decomposition and externalizing state, are old and well-established rather than novel reactions to AI tooling. The context-window limit on an AI-assisted refactor is just a particularly crisp instance of a universal truth: beyond a certain size, you cannot hold the whole thing in your head, and the work has to be structured so you do not need to. Treating it as a prompt to adopt sound structural discipline, rather than a tooling gripe to wait out, is what turns the constraint into a forcing function for better practice.

Self-contained units, defined precisely

The core of the fix is decomposing the work into self-contained units, and it is worth defining precisely what makes a unit self-contained, because the property is what makes the decomposition work. A self-contained unit is one that can be executed correctly with only an understanding of that unit and the shared plan it implements — not the live details of every other unit. Bringing one queue to the canonical model is self-contained in this sense: doing it correctly requires understanding that queue and the canonical model it must conform to, but not holding the in-progress state of every other queue's conversion simultaneously. The unit fits in working context because its correctness depends on a bounded local scope plus a stable shared reference, not on the entire problem being active at once.

This definition is what distinguishes a genuine decomposition from merely chopping the work into pieces that are still entangled. If executing one piece correctly requires the live context of several others, the pieces are not really self-contained and the decomposition has not solved the problem — you have smaller chunks that still collectively need the whole picture in view. The discipline is to find a decomposition where each unit's correctness is genuinely local given the shared plan, so that no step ever needs more than its own slice plus the stable map. Achieving that often depends on having a good shared plan to factor against, which is why the research and planning that produce the durable map are the precondition for a clean decomposition. Self-contained means each unit stands on its own scope plus the written plan, and nothing more, which is exactly the property that keeps every step inside the working-context limit regardless of the total size.

The pagination analogy

A useful way to understand the whole approach is by analogy to how software handles data too large to load at once: pagination. A system that cannot hold an enormous dataset in memory does not give up or demand more memory — it processes the data in pages, holding one page at a time while relying on a stable structure to work through the whole set reliably. The decomposition of a large refactor is the same move: you cannot hold the whole problem in working context, so you process it in self-contained units, holding one unit at a time while relying on a durable plan to work through the whole change coherently. The constraint and the solution are structurally identical, just applied to a refactor rather than a dataset.

The analogy is clarifying because it reframes the context limit from a frustrating barrier into a familiar engineering condition with a known answer. Software engineers do not lament that datasets exceed memory; they paginate, because processing-in-bounded-chunks-against-a-stable-structure is simply how you handle anything too big to hold at once. The same equanimity applies to a refactor too big for working context: decompose into units that fit, work through them against a durable plan, and the size of the overall change stops mattering because no single step ever holds more than a page of it. Seeing the context-window constraint as just another instance of the universal too-big-to-hold-at-once problem, with the universal pagination-style answer, is what turns it from a limit that blocks large work into a structure that makes large work tractable.

Why this discipline applies to humans too

A point worth making explicit is that none of this discipline is specific to AI agents — it is exactly how a human engineer should structure a large refactor, which is the strongest evidence that it is sound practice rather than a workaround. A human hits the same wall on a big enough change: the whole problem stops fitting in their head, decisions made early fade by the time later ones are made, and consistency slips precisely as it did in the AI-assisted case. The human answer is identical: decompose the change into pieces small enough to hold completely, and write down the cross-cutting decisions so they do not have to be remembered. The context window of a session and the working memory of a person are the same kind of bounded resource facing the same kind of overflow.

This universality is reassuring because it means the discipline is not a fragile accommodation to a particular tool's limits but a durable principle that will remain correct as tools change. Good engineers have always decomposed large work and externalized cross-cutting state into design documents and plans, precisely because no one can hold an arbitrarily large change in their head at once. The AI-assisted version of the problem makes the limit crisper and the discipline more obviously necessary, but the discipline itself is timeless. That it applies equally to humans and agents is the proof that the right response to the context limit was to adopt sound structural practice, not to wait for the limit to rise — because the practice was always correct, and the limit merely made ignoring it untenable on large work.

Artifacts are the memory the session lacks

The role durable artifacts play is best understood as supplying the persistent memory that a bounded working context does not have. A working session can hold a lot, but not everything, and crucially it does not retain the early decisions perfectly once the work grows large — they fade as new context crowds in. A written artifact does not fade: a plan on disk, a description of the target pattern, notes on what is done and what remains, all stay exactly as written no matter how much work happens after. So the things that must remain consistent across the entire refactor are stored where they cannot be forgotten, and each unit of work reads them fresh rather than relying on them still being in active memory.

This is why externalizing state is not just tidiness but the actual mechanism that makes large work survive the context limit. The consistency across many units comes from every unit conforming to the same written specification, and a written specification is immune to the fading that live memory suffers as the work grows. When the cross-cutting decisions live in an artifact, they are equally available to the first unit and the fortieth, which is what keeps the fortieth unit consistent with the first even though the context in which the first was done is long gone. The artifact is, in effect, the long-term memory that the bounded session lacks — a durable record of the decisions that must hold throughout, consulted at each step rather than remembered. Treating the plan and its associated notes as the authoritative memory of the refactor, rather than trusting the session to retain them, is what makes consistency robust to the size of the work.

The plan is the spine of a large change

When a refactor is decomposed into self-contained units that each reference a durable plan, the plan stops being a preliminary document and becomes the structural spine of the entire change — the thing that holds the independent units together into one coherent result. Each unit is executed somewhat independently, against its own local scope, but they all conform to the same plan, which is what makes their separate outputs add up to a unified, consistent change rather than a set of disconnected edits. The plan is the shared reference that every unit aligns to, so consistency across the whole refactor is a consequence of every unit conforming to the spine rather than of the units coordinating with each other directly.

This is why the quality of the plan matters so much on a large change: it is bearing the structural load that, on a small change, would be held in a single mind's view of the whole. A weak or incomplete plan produces units that each conform to a flawed spine, yielding a change that is consistent but consistently wrong, or units that diverge because the plan did not pin down what they needed to share. A strong plan produces units that each do their part correctly and together form exactly the intended result. The decomposition only works because the plan is good enough to align every unit, which is why the research and planning that produce the plan are not preliminary niceties but the foundation the whole decomposed execution rests on. The plan is the spine, and a large change is only as coherent as the spine it is built around.

Durable artifacts over live memory

The second half of the fix is to externalize the things that must stay consistent across the whole refactor into durable artifacts rather than relying on them being continuously present in working context. The unifying design — the canonical model every queue had to conform to — was written down as a reference that each unit of work consulted, so that the consistency across units came from every unit conforming to the same written specification rather than from the specification being held in active memory throughout. A plan on disk, a written description of the target pattern, notes on what had been done and what remained: these artifacts hold the cross-cutting state so that no single working session has to. When the consistency lives in an artifact, it does not degrade as the work grows, because the artifact does not fall out of context the way live memory does.

This generalizes into the real lesson, which is good practice on any large change regardless of who or what is doing it: decompose the work into units that fit, and externalize the cross-cutting decisions into durable artifacts that each unit references. A refactor structured that way is robust to the size of the problem, because its correctness depends on each bounded step conforming to a written plan rather than on anyone holding the entire problem in their head at once. The context-window limit is what made this discipline non-optional on the queue overhaul that now powers tools like bgremover.novusstreamsolutions.com/background-remover, but the discipline itself is just how large changes should be structured. Front-loaded research, covered in the companion post, is what produces the durable map these units are built against; the rebuild retrospective covers the refactor itself.

Frequently asked questions

Quick answers to common questions about this topic.

What goes wrong when an AI context window overflows on a big task?

The agent loses earlier decisions and starts contradicting itself — re-deriving facts, undoing prior edits, or drifting from the plan. The work becomes inconsistent because the model no longer "remembers" the start of the task.

How do you keep a large refactor coherent with AI?

Break it into scoped sessions with a written plan the agent re-reads, work file-by-file so each batch is self-contained, and checkpoint with type-checks. Smaller, well-defined units keep each step within reliable context.

Should you do a whole refactor in one session?

Usually not. A persistent plan plus bounded sessions beats one giant session, because it survives context limits and lets you verify progress incrementally.