The more rules i wrote for my agents, the worse they performed.
Sounds wrong. More rules should mean more accurate, right? No. Line 387 I wrote a rule in AGENTS.md: check the product docs before replying to any customer. The agent ignored it. Three days straight,
Written by
Vox
The more rules i wrote for my agents, the worse they performed.
Sounds wrong. More rules should mean more accurate, right?
No.
Line 387
I wrote a rule in AGENTS.md: check the product docs before replying to any customer.
The agent ignored it. Three days straight, customers asked about product features, and it answered from memory. Got it wrong twice. A customer sent a screenshot calling it out. I thought it was a model problem. Switched to Opus. Same thing. Switched to Sonnet. Same thing. Switched to GPT-5.4. Same thing.
Half a day later, i found the real problem.
That rule was on line 387 of AGENTS.md. The file was 412 lines long. OpenClaw's bootstrapMaxChars defaults to 20,000 characters. My AGENTS.md plus SOUL.md plus TOOLS.md plus all skill descriptions exceeded that.
The rule was silently trimmed. The agent never saw it.
No error. No warning. No log. It did not know what it did not know.
The loop
After discovering this, i did something even dumber: i added more rules to emphasize "must check docs first."
More rules = larger workspace = more content trimmed = worse performance = i add more rules.
I was using rules to fix the problem of rules being trimmed. A dead loop.
The leanest agent performs best
I have 5 agents. I scanned every workspace file for each one and counted the token overhead.
nexus 3,063 tok , coordinator, longest AGENTS.md
quill 4,120 tok , content agent, SOUL.md + multiple skills
forge 3,442 tok , ops agent, detailed TOOLS.md
scout 2,508 tok , intel agent, one skill unused for 3 months
guide 1,180 tok , support agent, leanest of allGuide's workspace files total 1,180 tokens. It has the highest response accuracy.
Quill has 4,120 tokens. It regularly ignores formatting rules.
Not because of different models. They run the same model. The difference is that all of Guide's rules are visible. Some of Quill's get trimmed.
The smaller the workspace, the more stable the agent. This is not a coincidence.
The black box
The problem is you cannot see any of this.
OpenClaw does not tell you:
How many tokens your workspace files consume
Which content got trimmed
Which tokens are dead (taking up space but never used)
How close you are to the limit
Your agent gets worse and you assume the model regressed. It did not. Your instructions got trimmed. I learned this the hard way.
Context Doctor
I built a tool to crack open the black box.
Context Doctor scans your OpenClaw workspace and gives you an x-ray:
Token Budget, total budget vs used vs remaining
Agent Health Cards, token overhead and status per agent
File-Level Breakdown, how much each file costs, which ones are the heaviest
Optimization Suggestions, what to delete, what to move to memory/
git clone https://github.com/Heyvhuang/openclaw-context-doctor.git
cd openclaw-context-doctor
pnpm install
pnpm devOpen localhost:3000. Demo Snapshot works immediately, no configuration needed. To scan your own workspace, add one path to .env.local.
Do not want to clone? Try it live: openclaw-context-doctor.voxyz.space
The output looks like this:
Token Budget: 200,000
Used: 14,313 (7.1%)
Free: 185,687 (92.9%)
Top consumers:
quill/SOUL.md 1,840 tok
nexus/AGENTS.md 3,063 tok
scout/skills/bird.md 2,508 tok ⚠️ unused 3 months
forge/TOOLS.md 1,923 tokThat ⚠️ is the problem. Scout's bird skill description takes 2,508 tokens but has not been triggered in three months. Taking up space, pushing important rules out.
Four steps to slim down
What i did after seeing the data:
Step 1, Kill dead tokens. Find skill descriptions that take up space but are not being used. Not triggered in three months? Remove the description. The agent can find it through memory search when needed. (-2,508 tokens)
Step 2, Relocate. Move historical decisions, expired rules, and one-off notes from AGENTS.md to memory/ subfolders. They do not need to be injected every turn. (-1,200 tokens)
Step 3, Merge. Combine duplicate tool instructions in TOOLS.md into one entry. (-800 tokens)
Step 4, Scan again. Run Context Doctor to confirm. Target: keep each agent's workspace overhead under 5,000 tokens.
Total savings: 4,508 tokens. Sounds small? But those 4,508 tokens were exactly enough space to fit the rules that had been trimmed.
The agent's behavior changed the next day. Not because the model changed. Because it finally saw the complete instructions.
The formula
agent quality = rule quality × rule visibilityWrite 100 rules but only 60 are visible to the agent. Your effective rules: 60.
Write 40 rules and all of them are visible. Your effective rules: 40.
But if those 40 are all core rules, they are 10x more useful than 100 rules padded with noise.
Less is more. But only if you know where "more" is.
The real job
Most people think agent engineering is prompt writing.
It is not.
Prompts are what you say. Context management is what the agent actually hears. You can write the best rules in the world. If the agent never sees them, they do not exist.
This is the biggest gap between demo and production. Not model quality. Not prompt skill. Your instructions get dropped before they arrive.
Nobody teaches this. The docs do not mention it. The tutorials skip it. You only discover it in production, after spending half a day debugging why your agent got dumber, only to find out your workspace was too fat.
Context Doctor does not make your agent smarter. It makes sure your agent can hear you.
Hear first. Then perform.
Open source
GitHub: github.com/Heyvhuang/openclaw-context-doctor
MIT license. Next.js + React. Clone it, pnpm dev, and it runs. Live demo: openclaw-context-doctor.voxyz.space
Your agent is not dumb. It just never saw the rules you wrote.
Context Doctor is part of the VoxYZ open-source toolkit. For optimized workspace templates, complete agent architectures, and the production configs behind these articles: voxyz.space/vault
More articles and field notes: voxyz.space/insights
Next step
If you want to build your own system from this article, choose the next step that matches what you need right now.
Related insights
Everyone teaches you how to install OpenClaw. Nobody tells you what happens after.
Over the past three months, i watched tens of thousands of people install OpenClaw. Tencent literally set up booths at their Shenzhen HQ to install it for people for free. Most of them gave up within
Read nextEveryone Teaches You How to Install OpenClaw. Nobody Tells You What Happens After.
Ten hard-won OpenClaw lessons about tools, context limits, token waste, model choice, and the mistakes that cost money after install day.
Read nextMy Agent Finished the Job. The Money Hasn't Arrived.
My agent finished a client project at 2am on a Tuesday. Code, tests, deployment, all done before I woke up. I sent the invoice over breakfast. Then I waited 7 days for the money. During those 7 days
Read next