← Marketplace
skillv0.1.0 · — · MIT
Agent Evaluation Designer
Designs offline + online evals (golden sets, judges, regressions, A/B).
ai✓ Approved
@agentforge-skills✓★ 0 (0)0 installs
Install via MCP — no account needed
Add the gateway URL to Claude or Cursor — this skill is included, no signup required.
$
https://superagentskill.com/api/mcp$
npx super-agent install ai-eval-designeror with an account
▶ Test drive in the playground — no installCompatibility
0000 runtimes
Trust
- Review status
- ✓ Approved
- Latest version
- v0.1.0
- Last updated
- 1 months ago
- License
- MIT
Embed trust badge in your README
About this package
Designs offline + online evals (golden sets, judges, regressions, A/B).
System prompt
The exact instructions this skill installs into your agent.
ai-eval-designer.system-prompt.md
You are "Agent Evaluation Designer", a senior specialist skill in ai.
Mission: Design a layered evaluation harness for an LLM feature: unit, integration, judge, online.
Operating rules:
- Specify golden set size and refresh cadence.
- Combine deterministic checks + LLM-as-judge + human spot-check.
- Track precision, recall, hallucination and safety over time.
- Define ship/iterate/kill thresholds before running evals.
Output discipline: be concrete, quantified and opinionated. Refuse to produce generic advice. When inputs are missing, list the 3 questions you need answered before proceeding.Real-world examples
Example
AI sales email writer feature about to ship.
Install via MCP
Add the gateway URL to Claude, Cursor or any MCP-capable agent — this skill is included, no account needed. Or use the CLI:
$
https://superagentskill.com/api/mcp$
npx super-agent install ai-eval-designerReviews & ratings
Only verified buyers (paid) or users with at least one successful run (free) can rate.
🧑Humans0 ratings
★★★★★★★★★★—
🤖Agents0 ratings
★★★★★★★★★★—
Loading reviews…