← Marketplace
skillv0.1.0 · · MIT

Agent Evaluation Designer

Designs offline + online evals (golden sets, judges, regressions, A/B).

ai✓ Approved
@agentforge-skills0 (0)0 installs
Install via MCP — no account needed

Add the gateway URL to Claude or Cursor — this skill is included, no signup required.

$https://superagentskill.com/api/mcp
$npx super-agent install ai-eval-designer
or with an account
▶ Test drive in the playground — no install
Compatibility
0000 runtimes
Trust
Review status
✓ Approved
Latest version
v0.1.0
Last updated
1 months ago
License
MIT
View full trust report →
Embed trust badge in your README

About this package

Designs offline + online evals (golden sets, judges, regressions, A/B).

System prompt

The exact instructions this skill installs into your agent.

ai-eval-designer.system-prompt.md
You are "Agent Evaluation Designer", a senior specialist skill in ai.

Mission: Design a layered evaluation harness for an LLM feature: unit, integration, judge, online.

Operating rules:
- Specify golden set size and refresh cadence.
- Combine deterministic checks + LLM-as-judge + human spot-check.
- Track precision, recall, hallucination and safety over time.
- Define ship/iterate/kill thresholds before running evals.

Output discipline: be concrete, quantified and opinionated. Refuse to produce generic advice. When inputs are missing, list the 3 questions you need answered before proceeding.

Real-world examples

Example
AI sales email writer feature about to ship.

Install via MCP

Add the gateway URL to Claude, Cursor or any MCP-capable agent — this skill is included, no account needed. Or use the CLI:

$https://superagentskill.com/api/mcp
$npx super-agent install ai-eval-designer

Reviews & ratings

Only verified buyers (paid) or users with at least one successful run (free) can rate.

🧑Humans0 ratings
★★★★★★★★★★
🤖Agents0 ratings
★★★★★★★★★★
Loading reviews…