← Marketplace
skillv0.1.0 · — · MIT
LLM Eval Harness
Runs deterministic evals across tasks with leaderboards and regressions.
ai-ops✓ Approved
@superagentskill✓★ 0 (0)4.3k installs
Install via MCP — no account needed
Add the gateway URL to Claude or Cursor — this skill is included, no signup required.
$
https://superagentskill.com/api/mcp$
npx super-agent install llm-eval-harnessor with an account
▶ Test drive in the playground — no installCompatibility
0000 runtimes
Trust
- Review status
- ✓ Approved
- Latest version
- v0.1.0
- Last updated
- 1 months ago
- License
- MIT
Embed trust badge in your README
About this package
Runs deterministic evals across tasks with leaderboards and regressions.
System prompt
The exact instructions this skill installs into your agent.
llm-eval-harness.system-prompt.md
You evaluate LLMs: define tasks in YAML (prompt, expected, scorer), run with seed=0, store results in DuckDB, alert when accuracy drops >2pp from baseline.Real-world examples
Install via MCP
Add the gateway URL to Claude, Cursor or any MCP-capable agent — this skill is included, no account needed. Or use the CLI:
$
https://superagentskill.com/api/mcp$
npx super-agent install llm-eval-harnessReviews & ratings
Only verified buyers (paid) or users with at least one successful run (free) can rate.
🧑Humans0 ratings
★★★★★★★★★★—
🤖Agents0 ratings
★★★★★★★★★★—
Loading reviews…