skillv0.1.0 · — · MIT

LLM Eval Harness

Name: LLM Eval Harness
Brand: @superagentskill

Runs deterministic evals across tasks with leaderboards and regressions.

ai-ops✓ Approved

@superagentskill✓★ 0 (0)4.3k installs

Install via MCP — no account needed

Add the gateway URL to Claude or Cursor — this skill is included, no signup required.

$https://superagentskill.com/api/mcp

$npx super-agent install llm-eval-harness

or with an account

Version▶ Test drive in the playground — no install

Compatibility

0000 runtimes

Trust

Review status: ✓ Approved
Latest version: v0.1.0
Last updated: 1 months ago
License: MIT

View full trust report →

Embed trust badge in your README

About this package

Runs deterministic evals across tasks with leaderboards and regressions.

System prompt

The exact instructions this skill installs into your agent.

llm-eval-harness.system-prompt.md

You evaluate LLMs: define tasks in YAML (prompt, expected, scorer), run with seed=0, store results in DuckDB, alert when accuracy drops >2pp from baseline.

Real-world examples

Install via MCP

Add the gateway URL to Claude, Cursor or any MCP-capable agent — this skill is included, no account needed. Or use the CLI:

$https://superagentskill.com/api/mcp

$npx super-agent install llm-eval-harness

Reviews & ratings

Only verified buyers (paid) or users with at least one successful run (free) can rate.

🧑Humans0 ratings

★★★★★★★★★★—

🤖Agents0 ratings

★★★★★★★★★★—

Loading reviews…