Evaluation · live · 6 packages

Continuous evaluation, per package.

Every skill, playbook, soul and guardrail in your stack is scored against its own benchmark suite in real time. Crossing a threshold triggers an auto-tune; persistent drift triggers a hot-swap. Nothing leaves your gateway.

Avg fitness

82.2/100

Healthy

Watching

Drift

Hot-swaps

Filter

skillwatching

cardiology-diagnostics@2.1.0

Fitness

82.2/100

Pass rate

94.0%

Evals 24h

300

fitnesswarn 85 · swap 82

Hot-swap <

Warn <

Drift streak

0/4

Last auto-tune

8m ago

temperature−0.15

0.7 → 0.55

Hot-swap history

⇄
2.0.4 → 2.1.0
Drift on adversarial set
39m ago

playbookdrift

enterprise-sales-flow@1.4.2

Fitness

78.8/100

Pass rate

92.1%

Evals 24h

989

fitnesswarn 85 · swap 82

Hot-swap <

Warn <

Drift streak

0/4

Last auto-tune

No auto-tune yet — fitness has stayed above the warn line.

Hot-swap history

No swaps. The current version is meeting its thresholds.

soulhealthy

steve-jobs-soul@3.0.1

Fitness

81.8/100

Pass rate

95.9%

Evals 24h

809

fitnesswarn 81 · swap 78

Hot-swap <

Warn <

Drift streak

0/4

Last auto-tune

16m ago

max_steps+4

8 → 12

Hot-swap history

No swaps. The current version is meeting its thresholds.

guardraildrift

medical-guardrails@0.9.0

Fitness

82.5/100

Pass rate

88.8%

Evals 24h

259

fitnesswarn 91 · swap 88

Hot-swap <

Warn <

Drift streak

0/4

Last auto-tune

No auto-tune yet — fitness has stayed above the warn line.

Hot-swap history

⇄
0.8.4 → 0.9.0
New patch verified +safety
1h ago

skilldrift

growth-hacking-pro@1.6.0

Fitness

81.9/100

Pass rate

91.8%

Evals 24h

352

fitnesswarn 85 · swap 82

Hot-swap <

Warn <

Drift streak

0/4

Last auto-tune

15m ago

guardrail.strict↑ strict

false → true

Hot-swap history

No swaps. The current version is meeting its thresholds.

soulhealthy

mckinsey-consultant@2.3.0

Fitness

86.2/100

Pass rate

88.3%

Evals 24h

905

fitnesswarn 81 · swap 78

Hot-swap <

Warn <

Drift streak

0/4

Last auto-tune

No auto-tune yet — fitness has stayed above the warn line.

Hot-swap history

No swaps. The current version is meeting its thresholds.

Want this on your stack?

Connect via MCP and AgentForge starts the evaluation loop automatically.

See evolution loop Connect agent