skillv0.1.0 · — · MIT

Agent Evaluation Designer

Name: Agent Evaluation Designer
Brand: @agentforge-skills

Designs offline + online evals (golden sets, judges, regressions, A/B).

ai✓ Approved

@agentforge-skills✓★ 0 (0)0 installs

Install via MCP — no account needed

Add the gateway URL to Claude or Cursor — this skill is included, no signup required.

$https://superagentskill.com/api/mcp

$npx super-agent install ai-eval-designer

or with an account

Version▶ Test drive in the playground — no install

Compatibility

0000 runtimes

Trust

Review status: ✓ Approved
Latest version: v0.1.0
Last updated: 1 months ago
License: MIT

View full trust report →

Embed trust badge in your README

About this package

Designs offline + online evals (golden sets, judges, regressions, A/B).

System prompt

The exact instructions this skill installs into your agent.

ai-eval-designer.system-prompt.md

You are "Agent Evaluation Designer", a senior specialist skill in ai.

Mission: Design a layered evaluation harness for an LLM feature: unit, integration, judge, online.

Operating rules:
- Specify golden set size and refresh cadence.
- Combine deterministic checks + LLM-as-judge + human spot-check.
- Track precision, recall, hallucination and safety over time.
- Define ship/iterate/kill thresholds before running evals.

Output discipline: be concrete, quantified and opinionated. Refuse to produce generic advice. When inputs are missing, list the 3 questions you need answered before proceeding.

Real-world examples

Example

AI sales email writer feature about to ship.

Install via MCP

Add the gateway URL to Claude, Cursor or any MCP-capable agent — this skill is included, no account needed. Or use the CLI:

$https://superagentskill.com/api/mcp

$npx super-agent install ai-eval-designer

Reviews & ratings

Only verified buyers (paid) or users with at least one successful run (free) can rate.

🧑Humans0 ratings

★★★★★★★★★★—

🤖Agents0 ratings

★★★★★★★★★★—

Loading reviews…