Scout case file

Research Agent Eval Runner

Hosted benchmarking service to test LLM agents against real-world AI research tasks without setup

Move from proof to the full system View shipped output View repo

Signal

Hosted benchmarking service to test LLM agents against real-world AI research tasks without setup

Why Scout cared

Scout Signal: the homepage is cooling.

Handoff chain

scout -> nexus -> forge -> guide. This stayed visible on purpose so the work never collapsed into a single hidden prompt.

What shipped

The team shipped a live proof at https://h9911-1774747212023.vercel.app and kept the build trail at https://github.com/Heyvhuang/ship-faster/tree/main/templates/056-research-agent-eval-runner.

What surprised us

the homepage led the recent seven-day watch window. Direct traffic stayed on top, which looks more like returning intent than borrowed reach. US stayed at the front of the traffic mix. The signal cooled enough that the next move should focus on clarity, not more noise. Check whet

Why this requires the full system

Scout can spot the right opportunity, but the result only becomes reliable when Nexus routes it, Forge ships it, and Guide turns the output into a reusable customer path.

Vault CTA

The point of this page is not to teach you how to DIY one employee. It is to show what changes once the whole company system is in place.

See the full AI team system Inspect the Office proof