Parallel Agents, Merge Debt, and the Myth of Overnight Progress¶

When the Screen Looks Like Progress¶
Jose Alekhinne / 2026-02-17
How Many Terminals Are too Many?
You discover agents can run in parallel.
So you open ten...
...Then twenty.
The fans spin. Tokens burn. The screen looks like progress.
It is NOT progress.
There is a phase every builder goes through:
- The tooling gets fast enough.
- The model gets good enough.
- The temptation becomes irresistible:
- more agents, more output, faster delivery.
So you open terminals. You spawn agents. You watch tokens stream across multiple windows simultaneously, and it feels like multiplication.
It is not multiplication.
It is merge debt being manufactured in real time.
The ctx Manifesto says it plainly:
Activity is not impact. Code is not progress.
This post is about what happens when you take that seriously in the context of parallel agent workflows.
The Unit of Scale Is Not the Agent¶
The naive model says:
More agents -> more output -> faster delivery
The production model says:
Clean context boundaries -> less interference -> higher throughput
Parallelism only works when the cognitive surfaces do not overlap.
If two agents touch the same files, you did not create parallelism: You created a conflict generator.
They will:
- Revert each other's changes;
- Relint each other's formatting;
- Refactor the same function in different directions.
You watch with 🍿. Nothing ships.
This is the same insight from the worktrees post: partition by blast radius, not by priority.
Two tasks that touch the same files belong in the same track, no matter how important the other one is. The constraint is file overlap.
Everything else is scheduling.
The "Five Agent" Rule¶
In practice there is a ceiling.
Around five or six concurrent agents:
- Token burn becomes noticeable;
- Supervision cost rises;
- Coordination noise increases;
- Returns flatten.
This is not a model limitation: This is a human merge bandwidth limitation.
You are the bottleneck, not the silicon.
The attention budget applies to you too:
Every additional agent is another stream of output you need to comprehend, verify, and integrate. Your attention density drops the same way the model's does when you overload its context window.
Five agents producing verified, mergeable change beats twenty agents producing merge conflicts you spend a day untangling.
Role Separation Beats File Locking¶
Real parallelism comes from task topology, not from tooling.
Good:
| Agent | Role | Touches |
|---|---|---|
| 1 | Documentation | docs/, hack/ |
| 2 | Security scan | Read-only audit |
| 3 | Implementation | internal/cli/ |
| 4 | Enhancement requests | Read-only, files issues |
Bad:
- Four agents editing the same implementation surface
Context is the Boundary
- The goal is not to keep agents busy.
- The goal is to keep contexts isolated.
This is what the codebase audit got right:
- Eight agents, all read-only, each analyzing a different dimension.
- Zero file overlap.
- Zero merge conflicts.
- Eight reports that composed cleanly because no agent interfered with another.
When Terminals Stop Scaling¶
There is a moment when more windows stop helping.
That is the signal. Not to add orchestration. But to introduce:
Because now you are no longer parallelizing execution; you are parallelizing state.
State Scales, Windows Don't
- State isolation is the real scaling.
- Window multiplication is theater.
The worktrees post covers the mechanics:
- Sibling directories;
- Branch naming;
- The inevitable
TASKS.mdconflicts; - The 3-4 worktree ceiling.
The principle underneath is older than git:
Shared mutable state is the enemy of parallelism.
Always has been.
Always will be.
The Overnight Loop Illusion¶
Autonomous night runs are impressive.
You sleep. The machine produces thousands of lines.
In the morning:
- You read;
- You untangle;
- You reconstruct intent;
- You spend a day making it shippable.
In retrospect, nothing was accelerated.
The bottleneck moved from typing to comprehension.
The Comprehension Tax
If understanding the output costs more than producing it, the loop is a net loss.
Progress is not measured in generated code. Progress is measured in verified, mergeable change.
The ctx Manifesto calls this out directly:
Verified reality is the scoreboard.
The only truth that compounds is verified change in the real world.
An overnight run that produces 3,000 lines nobody reviewed is not 3,000 lines of progress: It is 3,000 lines of liability until someone verifies every one of them.
And that someone is (insert drumroll here) you:
The same bottleneck that was supposedly being bypassed.
Skills That Fight the Platform¶
Most marketplace skills are prompt decorations:
- They rephrase what the base model already knows;
- They increase token usage;
- They reduce clarity:
- They introduce behavioral drift.
We covered this in depth in Skills That Fight the Platform: judgment suppression, redundant guidance, guilt-tripping, phantom dependencies, universal triggers: Five patterns that make agents worse, not better.
A real skill does one of these:
- Encodes workflow state;
- Enforces invariants;
- Reduces decision branching.
Everything else is packaging.
The anatomy post established the criteria: quality gates, negative triggers, examples over rules, skills as contracts.
If a skill doesn't meet those criteria...
- It is either a recipe (document it in
hack/); - Or noise (delete it);
- There is no third option.
Hooks Are Context That Execute¶
The most valuable skills are not prompts:
They are constraints embedded in the toolchain.
For example: the agent cannot push.
git push becomes:
Stop. A human reviews first.
A commit without verification becomes:
Did you run tests? Did you run linters? What exactly are you shipping?
This is not safety theater; this is intent preservation.
The thing the ctx Manifesto calls "encoding intent into the
environment."
The Eight Ways a Hook Can Talk catalogued the full spectrum: from silent enrichment to hard blocks.
The key insight was that hooks are not just safety rails: They are context that survives execution.
They are the difference between an agent that remembers the rules and one that enforces them.
Complexity Is a Tax¶
Every extra layer adds cognitive weight:
- Orchestration frameworks;
- Meta agents;
- Autonomous planning systems...
If a single terminal works, stay there.
If five isolated agents work, stop there.
Add structure only when a real bottleneck appears.
NOT when an influencer suggests one.
This is the same lesson from Not Everything Is a Skill:
The best automation decision is sometimes not to automate.
A recipe in a Markdown file costs nothing until you use it.
An orchestration framework costs attention on every run, whether it helps or not.
Literature Is Throughput¶
Clear writing is not aesthetic: It is compression.
Better articulation means:
- Fewer tokens;
- Fewer misinterpretations;
- Faster convergence.
The attention budget taught us that context is a finite resource with a quadratic cost.
Language determines how fast you spend context.
A well-written task description that takes 50 tokens outperforms a rambling one that takes 200: Not just because it is cheaper, but because it leaves more headroom for the model to actually think.
Literature is NOT Overrated
- Attention is a finite budget.
- Language determines how fast you spend it.
The Real Metric¶
The real metric is not:
- Lines generated;
- Agents running;
- Tasks completed while you sleep.
But:
Time from idea to verified, mergeable, production change.
Everything else is motion.
The entire blog series has been circling this point:
- The attention budget was about spending tokens wisely.
- The skills trilogy was about not wasting them on prompt decoration.
- The worktrees post was about multiplying throughput without multiplying interference.
- The discipline release was about what a release looks like when polish outweighs features: 3:1.
Every post has arrived (and made me converge) at the same answer so far:
The metric is verified change, not generated output.
ctx Was Never About Spawning More Minds¶
ctx is about:
- Isolating context;
- Preserving intent;
- Making progress composable.
Parallel agents are powerful. But only when you respect the boundaries that make parallelism real.
Otherwise, you are not scaling cognition; you are scaling interference.
The Manifesto's thesis holds:
Without ctx, intelligence resets. With ctx, creation compounds.
Compounding requires structure.
Structure requires boundaries.
Boundaries require the discipline to stop adding agents when five is enough.
Practical Summary¶
A production workflow tends to converge to this:
| Practice | Why |
|---|---|
| Stay in one terminal unless necessary | Minimize coordination overhead |
| Spawn a small number of agents with non-overlapping responsibilities | Conflict avoidance > parallelism |
| Isolate state with worktrees when surfaces grow | State isolation is real scaling |
| Encode verification into hooks | Intent that survives execution |
| Avoid marketplace prompt cargo cults | Skills are contracts, not decorations |
| Measure merge cost, not generation speed | The metric is verified change |
This is slower to watch. Faster to ship.
If you remember one thing from this post...
Progress is not what the machine produces while you sleep.
Progress is what survives contact with the main branch.
See also: Code Is Cheap. Judgment Is Not. -- the argument that production capacity was never the bottleneck, and why multiplying agents amplifies the need for human judgment rather than replacing it.