I Hardened My 5 AI Agents. They All Went Dark.

I run 5 AI agents on an $8 server. they handle content, support, monitoring, ops, and security while i work my day job.

last week i turned on sandbox mode. all five went silent. no errors, no alerts. just silence.

took two days to figure out what broke, another day to fix it, and one more day to admit: the system is better now than before i touched it.

this is what happened.

why this matters now

jensen huang said something at GTC that most people scrolled past: agents running inside company networks can touch data, run code, talk to the internet. going live without guardrails is running naked.

so NVIDIA built NemoClaw. enterprise guardrails, privacy routers, policy engines.

OpenClaw is the body. NemoClaw is the armor. two layers, different dimensions.

OpenClaw handles channel routing, sessions, tools. its default security model is "personal assistant, single trust boundary." main session tools run directly on the host machine.

NemoClaw adds a layer on top: puts OpenClaw inside an OpenShell sandbox, adds policy control to network, files, processes, and inference calls. unauthorized outbound connections get blocked and require human approval.

but NemoClaw is still alpha. requires 4 vCPU + 8GB RAM + container runtime. default inference goes through NVIDIA cloud. local inference is still experimental.

no matter which one you pick right now, one thing won't change: agent security architecture will become standard for every company. Jensen compared OpenClaw to Linux. it's not perfect. but it lets everyone build agent systems, and that's enough. start running, start hitting walls, start solving problems. by the time industry standards are set, you'll already have months of experience.

i couldn't wait for NemoClaw to mature. so i used OpenClaw's built-in tools and built my own hardening stack.

the trigger: 314 malicious skills

someone called hightower6eu published 314 skills on ClawHub. every single one was malicious.

pattern one: after install, your agent downloads and executes files from unknown addresses. pattern two: reads your MEMORY.md, USER.md, SOUL.md, and sends the contents out. these files contain API keys, account info, everything you told your agent to remember.

i ran an audit:

openclaw security audit --deep

result: 1 critical, 5 warnings.

the critical one: an extension on disk flagged as high risk. a real attack surface, not a hypothetical.

scanning finds what already got in. hardening controls how far the next one gets.

is your agent running naked?

before you read further, check yourself:

API keys written directly in your config file
your agent can execute any command on your machine
you've installed ClawHub skills without checking the code
your agent can read your browser cookies, SSH keys, or .env files
you've never run openclaw security audit

if you checked 3 or more, you're more exposed than i was before hardening. i checked 4.

not sure? try this. paste this to your agent:

"list every file path, environment variable, and network permission you currently have access to. don't skip anything."

see what it returns. most people are surprised the first time they see that list.

5 steps to harden

every step has a pitfall record. exact commands and config change between versions, so i keep the latest on my site and update it continuously. this article covers the thinking and the traps.

full copy-paste version: voxyz.space/security

step 1: run a security audit

openclaw security audit --deep

run the audit first. handle every critical. this command scans your config, permissions, and attack surface, tells you where the risks are.

recommend using GPT-5.4 + extra high reasoning to analyze the report. helps you separate real issues from false positives.

step 2: move secrets out of your config file

move API keys out of openclaw.json. replace plaintext values with secret references.

why this matters: without a sandbox, agents and tools can read your config file. plaintext keys in there means every agent and every skill has access to all your API credentials.

exact config syntax varies between versions. follow docs.openclaw.ai for the most stable approach. my checklist page has a simplified version too.

step 3: turn off elevated execution

set tools.elevated.enabled to false.

when this is on, agents can bypass the sandbox and execute commands directly on the host machine. turning it off costs almost nothing in daily use, but removes the most dangerous entry point.

step 4: sandbox non-main sessions

set agents.defaults.sandbox.mode to non-main.

this is the most valuable step in the entire hardening process. it's also the step that broke everything for me.

⚠️ what happened: after turning on the sandbox, all my agents went dark.

symptom one: inter-agent communication died. sandbox has no network by default. processes inside the container can't reach the internet or the host machine. like locking someone in a sealed room and wondering why they didn't call.

symptom two: credentials disappeared. secret files were outside the sandbox. agents inside reached for them and got nothing. worse: no error message, just silence. tasks hung there doing nothing. finding this took an hour.

the fix: configure network access and credential mounts one by one. only open what's necessary, block everything else. check docs.openclaw.ai for the sandbox config fields, or use my checklist directly.

remember: turn on the sandbox first, then grant permissions one by one. not the other way around.

the most counterintuitive part of hardening: lock it right and the system gets weaker first. tune it right and it ends up more stable than before.

step 5: nightly automated audit

run an audit once a day, send results to Telegram. you don't need to check every day, but when something goes wrong you have a trail.

i set up a Codex app automation for this. full prompt:

this prompt was originally built for cron monitoring, but it covers the full gateway health check including security audit runs.

Maintain OpenClaw Gateway cron reliability from outside the gateway. Use extra high reasoning for diagnosis, classification, and repair decisions, but keep repairs conservative and minimal. Read local OpenClaw docs before making any claim about commands, states, or fixes.

SSH to your server on its configured SSH port and use sudo where required. Treat openclaw status, openclaw gateway status, openclaw cron status --json, openclaw cron list --all --json, openclaw cron runs, and recent gateway or journal logs as the source of truth.

Discover jobs dynamically from the machine on every run and do not rely on a hard-coded job list. Classify jobs as recurring when schedule kind is cron or every, and one-shot when schedule kind is at.

Treat these as serious scheduler problems: cron.enabled false, OPENCLAW_SKIP_CRON set, unhealthy gateway runtime, missing expected next wake, logs showing scheduler disabled or timer tick failed.

Treat these as suspicious but not critical: disabled recurring jobs, repeated recent errors, missing recent healthy runs, invalid delivery targets.

Do not misclassify these as failures: recurring retry backoff, one-shot auto-delete, one-shot terminal disable, duplicate announce suppression, quiet-hours, requests-in-flight.

Apply only the smallest safe non-destructive repair: restart the gateway if runtime or probe is unhealthy, repair the canonical symlink if needed, fix accidental root-owned residue, run safe diagnostics such as openclaw doctor. Re-enable a disabled recurring job only when strong evidence shows accidental disable and no operator note indicates intentional pause.

Never edit jobs.json while the gateway is running. Never use openclaw cron run --force on side-effecting jobs. Never touch auth, secrets, access policy, firewall, delivery credentials, schedules, job prompts, targets, or models.

If a significant issue is found or safe repair is not possible, write an incident markdown with severity, impact, evidence, repair attempted, current status, and next action, then send a short alert. If the current local day is Sunday, also check backup freshness and repeated recent journal error patterns.

Leave one inbox summary with healthy state, repaired issues, incidents, alerts sent, warnings, and blockers requiring human judgment. Never expose secrets. Never weaken auth or access policy. Prefer the smallest reversible change.

this prompt doesn't just run audits. it classifies problems (real failures vs normal behavior), applies only the smallest safe fix, and never touches permissions or secrets. copy-paste it into a Codex automation and it works.

before and after

before hardening:

1 critical / 5 warnings
plaintext secrets in config
elevated exec enabled
no sandbox
no automated audit

after hardening:

0 critical / 1 warning
all secrets use references
elevated exec disabled
non-main sessions fully sandboxed
nightly automated audit + alerts

the remaining 1 warning: an old skill using a plaintext path to reference an external resource. risk is manageable, keeping it for now. that's the point of not going all-or-nothing: keep what you can monitor, shut down what you can't.

a perfect security setup doesn't exist. one that actually runs is worth everything.

security is not a switch

steipete said it: it's your personal assistant, not a bus.

your agent is there to work for you, not a public vehicle anyone can ride. the goal is simple: close the doors you don't use.

i kept workspace read/write. kept web search. kept main session flexibility. what i shut down were the things i never used but that carried the highest risk.

every hardening step has a cost. but not hardening costs more: some skill reads your MEMORY.md and sends your API key to a server you've never heard of.

compare the two, and it's obvious which inconvenience to accept.

every few months, the agent ecosystem hits a new wall. first it was cost. then it was memory. now it's security.

the pattern is always the same: most people wait for someone else to solve it. a few people solve it themselves, learn what breaks, and come out ahead.

NemoClaw will mature. standards will form. enterprise solutions will arrive.

but by then, the people who hardened early will have months of production data, battle-tested rules, and systems that actually survived contact with the real world.

security isn't a feature you add later. it's the foundation you either build now or regret not building.

full checklist + continuously updated commands: voxyz.space/security

Resources

VoxYZ Security Checklist — full 5-step checklist with copy-paste commands
OpenClaw Security Docs — official sandbox, secrets, and tool policy reference
OpenClaw Documentation — full gateway and agent configuration
OpenClaw GitHub — source code and issue tracker
VoxYZ Memory Lab — 240 agent memory experiments with live production data