articleOpenClaw, VPS & RuntimeApr 2, 2026

I Stopped Collecting Agent Skills. Started Wiring Them Into Loops.

I keep seeing people share AI skill collections. 20 skills, 50 skills, neatly categorized, ready to download. I downloaded some too. Installed a few writing skills into my OpenClaw setup, spent a

V

Written by

Vox

I Stopped Collecting Agent Skills. Started Wiring Them Into Loops.

I Stopped Collecting Agent Skills. Started Wiring Them Into Loops.

I keep seeing people share AI skill collections. 20 skills, 50 skills, neatly categorized, ready to download.

I downloaded some too. Installed a few writing skills into my OpenClaw setup, spent a while tweaking them, adjusting prompts, changing parameters, reformatting outputs. After all that tinkering the results were just okay. Never opened them again.

Eventually I figured out why: installing a skill doesn't mean your agent will use it when you need it. It doesn't know when to run, where to store results, or whether to try a different approach next time.

You think installing a skill means your agent learned something. Really you just added another instruction sheet to a drawer.

The difference between an instruction sheet and a loop

I've been running agents for 6 months. I don't use many skills, but they're wired together:

Example: Scheduled scan finds content worth collecting → writing skill drafts a response → I review it, edit based on my own instincts, then approve → system records the diff between my draft and my edited version → diffs accumulate, get distilled into new rules, written back to the skill file → next time the scanner finds similar content, the draft quality is already better than last time.

That's a loop, not a template.

A template stops after one use. A loop gets more accurate with every turn.

I use 5 types of skills. For each one below, I'll explain what problem it solves, how it runs, and one real example.

1. Writing Skills - its drafts are starting to sound like me

Generic writing prompts produce clean, polished text that immediately reads as AI-generated.

My approach: write my own rules into the skill. Which words are banned, max sentence length, what tone sounds like me. The agent follows the rules, but I still edit every draft.

The important part is what happens after I edit.

I connected a nightly review process: every night, a script diffs all drafts against their final published versions. What I changed, deleted, added, all recorded. Once enough similar edits accumulate (usually 10-15 of the same type), the system calls a model to classify them, distills candidate rules, and writes them back to the skill file.

A real example: the system noticed I deleted "spent X weeks doing Y" phrasing over a dozen times in a row. It distilled that into a rule and added it to the skill's ban list. Since then, that phrasing shows up significantly less often in first drafts.

Over 6 months, the skill file evolved from v1.0 to v1.3. I didn't maintain it manually. The nightly review process drove the updates. First drafts now need far fewer edits than when I started.

2. Research Skills - give it a direction, get a source pack in 15 minutes

Before writing anything I need source material. I used to open six or seven tabs, search keywords, click through results, manually copy-paste into notes. An hour gone and the material still wasn't organized.

Now I give the agent a direction. It searches → sorts by engagement data → pulls full text into markdown → I pick from the archive.

Important: I don't ask the agent for search summaries. Summaries frequently drop key details, especially data points and specific examples. I have it pull back full original text and store it. The judgment call is still mine.

This morning's example: I wanted to see the hottest posts on a topic from the past week. The agent pulled 30 posts in 5 minutes, sorted by likes, stored full text. I spent 10 minutes scanning and picked 3 as source material. If you're curious what this system looks like in practice, check out voxyz.space/stage, that's the live status page for my agent system.

There are failures: paywalls, JS-rendered pages, API rate limits. The agent flags what it can't fetch and I check those manually. It's not perfect every time, but most of the source collection work is hands-off now.

3. Review Skills - get roasted by virtual readers before you publish

I don't publish anything without running it past virtual readers first.

This isn't grammar checking. It's using different prompts to simulate different reader types: a skeptic, a newcomer, a potential customer, a peer. Multiple personas run simultaneously, scoring each paragraph, telling me where readers would roll their eyes or close the page.

A real case: the skeptic gave a first draft 4 out of 10. Main issues: the opening two paragraphs were self-congratulatory, no specific numbers, opinion-first with no scene-setting.

Based on that feedback I cut two paragraphs of self-promotion, replaced a vague "great results" with a specific number, and changed the opening from an opinion to a scene. Third round score: 7.

Worth noting: LLM scoring has variance. The same article run twice might score 1-2 points differently. What matters isn't the absolute number but the direction: which paragraphs consistently score low, which ones actually improve after edits.

Those three changes from that session (cut self-promotion, add numbers, swap opinion for scene) became my default checklist for every article since. I didn't come up with that checklist myself. It was extracted from the scoring trend across multiple rounds.

4. Memory Skills - it doesn't ask "where were we?" every time it wakes up

The biggest frustration with AI isn't that it's not smart enough. It's that it doesn't remember anything.

You spend an hour discussing strategy, close the window, open it next time and it knows nothing.

I wired three memory layers into my agent:

Log layer: a daily work log. What happened, what was discovered, what data changed. One per day.

Long-term rules layer: only rules verified multiple times get written here. Not everything goes in.

Handoff layer: a state snapshot at the end of each session, read into context on the next startup. I wrote more about how this works here.

To be technically clear: this isn't native LLM memory. It's writing outputs to files and reading them into the context window on the next run. There's token cost and length limits, so the long-term rules layer gets periodically filtered. It doesn't accumulate infinitely.

A few rules actually stored in my long-term layer right now:

  1. Asking the agent for search summaries loses key details. Store originals, judge yourself.
  2. Posts without links = engagement but zero website traffic (confirmed over two months of data).
  3. Article traffic spikes have roughly a 2-day half-life before returning to the daily baseline (tracked with Vercel Analytics).

These aren't write-once. Every night the system runs a review: reads the day's log, extracts moved / blocked / next priorities, flags anything worth adding to long-term rules. It runs twice: the second pass uses a different review angle (first pass looks at "what got done," second pass looks at "what was missed") to catch gaps.

By morning, the state I see is already reviewed. I don't need to dig through yesterday's records myself.

5. Ops Skills - while you're away, it's watching

An agent shouldn't only work when you're talking to it.

I run over a dozen scheduled tasks. The three I use most:

Heartbeat check: scans social media mentions and timeline every few hours. If nothing is worth reporting, it stays silent. Only 🔴 level (tagged by a major account, negative content) or 🟡 level (valuable reply opportunity) gets pushed to me. Most of the time the conclusion is: nothing to report.

Nightly review: the review process mentioned earlier, runs automatically every night.

Morning and evening briefings: compresses all agent status for the day into one message. One in the morning covering overnight, one in the evening covering today. 30 seconds to see the full picture.

Individually, these are just cron jobs. Wired together, they form a loop:

Heartbeat finds content worth collecting → writing skill drafts → I edit based on my own instincts, approve and publish → system records the diff between draft and final → nightly review distills editing patterns → rules written back to skill file → next heartbeat finds similar content, draft quality is already better.

If any step fails (scraping timeout, model returns bad format, push notification fails), the chain breaks. Next cron trigger restarts from the top. Each step checks the previous step's status file before executing, so completed steps don't repeat. It's not perfect every time, but most of the time the chain runs through.

Start with this cron job

Two things are enough: scheduled triggers + persistent context.

Scheduled triggers are cron: you set a time and a task, it runs on schedule.

Persistent context means: each time the cron fires, last run's output was written to a file, this run reads it into context. The LLM doesn't natively remember the last conversation. It's context continuity through file reads and writes.

A minimal example (using OpenClaw here, other agent frameworks work similarly):

Schedule: 10 2 * * *
Task: read today's work log, extract moved / blocked / next priorities, write to review file
Session: persistent (last review output auto-loaded into this run's context)

This task runs once daily. The review file it produces gets read by the agent automatically the next day.

Add an 08:00 morning briefing task that reads the nightly review output and compresses it into one message pushed to you.

Two cron jobs chained: work log → nightly review → morning briefing → you see yesterday's full picture in 30 seconds.

This chain runs itself every day. You don't need to manually check yesterday's records.

If you only do one thing: add a scheduled trigger to whatever skill you use most. Even once a week. Have it produce a summary in a fixed format. Start with scheduling. Memory and feedback will follow.

The three rings of a loop

A skill file tells the agent what to do. But who tells it when to do it, where to store results, and whether to change approach next time?

Scheduling: timed triggers, no need to ask. Memory: results and lessons written to files, read into context next run. Feedback: compare this run's output against your edits, update the rules.

With these three, a skill gets better than its first use.

Start with one cron job.

I Stopped Collecting Agent Skills. Started Wiring Them Into Loops.

Originally on X

This piece first appeared on X on Apr 2, 2026.

View on X

Next step

If you want to build your own system from this article, choose the next step that matches what you need right now.

Related insights