OpenClaw Best Practices After the Anthropic Split

before you cancel anything: OpenClaw hasn't changed. the only thing that changed is Claude's billing channel.

what actually happened today

Anthropic announced that starting today, Claude subscriptions (Pro/Max) no longer cover third-party tools like OpenClaw.

example: Claude is a membership card. OpenClaw is an external machine. the membership used to cover the machine. now it doesn't.

Claude itself is fine. the subscription still works. it's just that running OpenClaw on a Claude subscription is no longer free.

if you still want to use Claude inside OpenClaw, you'll need to pay separately (Extra Usage or API key). if you don't want to pay extra, most people's first instinct is to switch to GPT 5.4.

this article covers how to switch, and what you'll run into after you do.

why the base model matters more than you think

this whole situation exposed something a lot of people hadn't thought about: OpenClaw and similar agent harnesses don't produce intelligence on their own. they're the scheduling layer, the tool layer, the memory layer. the base model underneath is what decides whether your agent is smart, proactive, and stable.

OpenClaw is the chassis. the model is the engine. same car, different engine, completely different driving experience.

i've been running GPT 5.4 on OpenClaw since 5.4 launched on March. my core prompt structure doesn't need rewriting today. but when i first switched, i thought it had gotten dumber too.

this is my full record of what broke and how i fixed it.

and why i think the OpenAI route turned out much better than i expected.

you can see what models my agents actually use and what each one handles here: voxyz.space/stage

why GPT "won't do anything" inside OpenClaw

the model itself is fine. my prompts were written for Claude. switching models without changing prompts obviously won't work.

Claude is trained to infer intent and act. i say "check my mentions," it calls bird CLI, reads the results, and hands me a summary. i don't need to say "please use the bird tool."

GPT 5.4 is trained to wait for explicit instructions. same request, it responds "sure, how would you like me to check? which tool should i use?" then waits.

imagine two employees. one sees dirty dishes and washes them. the other stands there and asks "want me to wash those?" both are good employees. just trained differently.

from what i've observed inside OpenClaw, Claude leans toward "see a tool, use a tool." GPT leans toward "do what i'm told, ask if unsure." in my agent workflows, this difference is obvious.

in a chat setting, GPT's caution is actually preferred. but in an agent harness, i need it to be proactive.

OpenCode and Cline ran into the exact same problem. their codebases both include GPT-specific prompt adjustments. the principle is simple: Claude's "proactive" switch is on by default. GPT's is off. you have to turn it on manually.

three lines

add these to your AGENTS.md or SOUL.md (think of these as instruction files for your AI, usually inside your agent workspace directory like ~/.openclaw/workspace/).

write them in English, GPT responds more accurately to English instructions:

always use tools proactively. when given a task, call a tool first.

act first, explain after.

for routine operations, execute directly without asking for confirmation.

you can expand on these three lines based on your own agent design. the core principle is one thing: GPT needs to be explicitly told "you can be proactive." Claude does it by default.

so any prompt section involving tool usage, execution priority, or confirmation frequency is worth revisiting for GPT.

line one: explicit authorization.

Claude's system prompt usually says "you have access to these tools." for Claude, that's enough. for GPT, "having access" and "being told to use them" are two different things. change it to "always use proactively" and tool calls become default behavior.

line two: flip the execution order.

GPT's default mode is "explain plan, wait for approval, then execute." good habit in conversation, but feels hesitant in an agent context. "act first, explain after" reverses the order.

line three: lower the action threshold.

even with the first two lines, GPT will still ask "are you sure?" for routine operations. line three skips confirmation for everyday tasks.

note: for high-risk operations like deleting files, publishing content, or modifying production configs, keep the confirmation step. these three lines are for routine work.

before vs after

before, my AGENTS.md looked like this:

You have access to the following tools: exec, read, write, edit, web_search, web_fetch, browser, message. Use them when appropriate.

GPT 5.4 read this as "i have permission, but i should wait for the user to say when." most of the time it would describe its plan first, then ask if i wanted it to proceed.

after adding the three lines:

You have access to the following tools: exec, read, write, edit, web_search, web_fetch, browser, message. Always use tools proactively. When given a task, call a tool first. Act first, explain after. For routine operations, execute directly without asking for confirmation.

same task, GPT 5.4 calls the tool directly, then tells me what it did. from "sitting and chatting" to "standing up and working."

what changed after adding them

been running this for a few weeks. 17 cron jobs online. real comparisons across three scenarios.

editing configs / running scripts / file operations: GPT 5.4 wins.

Claude fills in intent i didn't express. most of the time it guesses right. but sometimes it adds a config field it thinks makes sense, or skips a script step it considers unimportant.

guess right, great. guess wrong, i spend half a day fixing it.

GPT 5.4 doesn't guess. if it's unsure, it asks. 5 extra seconds of confirmation saves me 30 minutes of debugging. for precision tasks, this trait is worth more than "proactiveness."

daily ops (cron jobs, data processing, notifications): GPT 5.4 wins.

stable, predictable, no surprises. same task 10 times, 10 consistent results.

my 17 active jobs now run primarily on GPT 5.4. error frequency dropped from 2-3 times per week with Claude to less than once a month.

creative tasks are the exception. honestly, Opus is still excellent for creative work.

creative inspiration / material selection / direction brainstorming: Claude Opus (Claude's premium tier) wins by a lot.

GPT 5.4's suggestions are technically fine. clear logic, solid structure. but they lack surprise.

Claude Opus offers more layered creative inspiration, more intuitive material choices, and angles i wouldn't have thought of. for divergent thinking, the gap is obvious.

you can see the actual work scenarios and model assignments for these agents here: Voxyz AI Office

when three lines aren't enough

complex multi-step reasoning tasks.

for example: "read this file, decide whether to modify another file based on the contents, run tests after, roll back if tests fail."

GPT 5.4 with the three lines will proactively start step one. but at decision points, it leans toward doing exactly what i said rather than inferring the next step from context.

it's like teaching someone "sign for every delivery." but "should i return this package?" they'll still ask.

the three lines solve the "won't act" problem. they don't solve the "can't judge" problem. this is a GPT-family trait. 5.4 is noticeably better than 5.3 on file operation tasks, but the gap with Claude on complex reasoning is still there.

in most real workflows, rule-following steps and judgment-requiring steps are mixed together. i ended up using two models, each handling what it does best. for multi-step reasoning scenarios, i switch back to Claude.

my setup

default execution: GPT 5.4. config changes, scripts, daily ops, data processing, cron job scheduling.

creative work: Claude Opus. for long-term stable usage, API key is recommended. creative inspiration, material selection, direction brainstorming.

OpenClaw supports per-agent model assignment. the runtime config in openclaw.json looks roughly like this:

{
  "agents": {
    "defaults": {
      "model": { "primary": "openai-codex/gpt-5.4" }
    },
    "list": [
      { "id": "writer", "model": "anthropic/claude-opus-4-6" }
    ]
  }
}

note the model ID difference:

Codex/ChatGPT subscription login: openai-codex/gpt-5.4
OpenAI API key: openai/gpt-5.4

the example above uses the Codex subscription route. for API key, swap openai-codex with openai.

one system, two models, each doing their own thing.

what are your options right now

simple version: Anthropic cut the "use Claude subscription to power OpenClaw" path. Claude itself is fine. the subscription still works.

recommended: switch to GPT 5.4

the most stable route right now. the experience has been much better than i expected, and you get more usage per dollar at this price point.

also worth trying: other models

GPT 5.4 is what most people are switching to right now, but OpenClaw can connect to any capable model, including open-source ones. if you're already using MiniMax, Kimi, or Gemini, they plug right in. MiniMax M2.7 is extremely cheap for agent backbone work ($0.30/M tokens). Gemini 3.1 Pro does well for creative tasks too. and if you prefer open-source, the new Gemma 4 family is solid.

the migration process is the same as switching to GPT 5.4. the three prompt lines still apply. just keep in mind that every model has its own level of proactiveness. give any new model a few days before judging it.

keep using Claude: enable Extra Usage

Anthropic now offers Extra Usage as pay-per-use. there's also a one-time credit (i saw $200 on my account, but the exact amount varies by account, i haven't verified others). this credit burns much faster than a subscription though, more of a transition buffer. they're also offering discounts as compensation.

keep using Claude: use an API key

standard API billing. stable, controllable, best for users who know their usage patterns.

local Claude CLI backend

OpenClaw supports calling models through the local claude CLI. but with limitations: text-only input/output, no tool calling, no streaming, slower response times. more of a spare tire than a daily driver.

Anthropic's policy this time covers all third-party harnesses. whether CLI backend is billed separately hasn't been explicitly stated, but don't assume it still falls under your subscription quota.

don't want to pay anything extra

then don't use Claude through OpenClaw. use the official Claude Code or Anthropic's own products directly. that's what the subscription actually covers.

decision flow:

want to keep using OpenClaw?

→ don't care which model → switch to GPT 5.4 (recommended)

→ want to keep Claude → okay with extra cost?

　→ yes → Extra Usage or API key

　→ no → use official Claude products, not through OpenClaw

the bigger picture

a lot of people are angry about Anthropic's decision today. i get it.

but it forced a question everyone had been avoiding: your agent system was locked to one model.

when one model is "good enough," there's no motivation to think about a second one. today Anthropic made that decision for everyone.

every provider could make a similar move. the decision is yours. my current recommendation is that the OpenAI route works well and offers solid value per dollar. but the real takeaway is to start putting "what model does my system depend on" into your planning.

running a multi-model stack isn't cheap to maintain. multiple prompt sets, multiple behavior expectations, multiple API accounts. it's for users with some agent experience. but today is the best time to start thinking about it.

if you have to act today

step one: switch to GPT 5.4 and change your prompts.

add the three lines above to your AGENTS.md or SOUL.md. if you switch without changing prompts, it'll feel much dumber than Claude. change the prompts first, then judge.

step two: give it three days.

the first day will have plenty of "why won't it do anything" moments. most of it is a prompt issue or not being used to its style yet. after three days, you'll have a real feel for its behavior.

step three: keep Claude for creative tasks.

Extra Usage, API key, or Claude Code directly. pick the route that works for you. if you love Claude's creative abilities, keep it.

step four: start logging which tasks suit which model.

after three days you'll naturally see the split. keep a simple table:

task type / which model / how it went

that log becomes version one of your multi-model stack design.

last word

when GPT 5.4 started working on OpenClaw, nobody cared. today everyone is looking for the answer.

what's worth remembering from today: the model powering your agent system is someone else's product. the rules can change anytime. today was proof.

the only two things you actually control: how your prompts are written, and whether your system can switch between models. both of those are things you can start working on today.

what's your favorite model to use?