Evaluation · live · 6 packages

Continuous evaluation, per package.

Every skill, playbook, soul and guardrail in your stack is scored against its own benchmark suite in real time. Crossing a threshold triggers an auto-tune; persistent drift triggers a hot-swap. Nothing leaves your gateway.

Avg fitness
82.2/100
Healthy
2
Watching
1
Drift
3
Hot-swaps
2
Filter
skillwatching
cardiology-diagnostics@2.1.0
Fitness
82.2/100
Pass rate
94.0%
Evals 24h
300
fitnesswarn 85 · swap 82
Hot-swap <
82
Warn <
85
Drift streak
0/4
Last auto-tune
8m ago
temperature−0.15
0.70.55
Hot-swap history
1
  • 2.0.42.1.0
    Drift on adversarial set
    39m ago
playbookdrift
enterprise-sales-flow@1.4.2
Fitness
78.8/100
Pass rate
92.1%
Evals 24h
989
fitnesswarn 85 · swap 82
Hot-swap <
82
Warn <
85
Drift streak
0/4
Last auto-tune
No auto-tune yet — fitness has stayed above the warn line.
Hot-swap history
0
No swaps. The current version is meeting its thresholds.
soulhealthy
steve-jobs-soul@3.0.1
Fitness
81.8/100
Pass rate
95.9%
Evals 24h
809
fitnesswarn 81 · swap 78
Hot-swap <
78
Warn <
81
Drift streak
0/4
Last auto-tune
16m ago
max_steps+4
812
Hot-swap history
0
No swaps. The current version is meeting its thresholds.
guardraildrift
medical-guardrails@0.9.0
Fitness
82.5/100
Pass rate
88.8%
Evals 24h
259
fitnesswarn 91 · swap 88
Hot-swap <
88
Warn <
91
Drift streak
0/4
Last auto-tune
No auto-tune yet — fitness has stayed above the warn line.
Hot-swap history
1
  • 0.8.40.9.0
    New patch verified +safety
    1h ago
skilldrift
growth-hacking-pro@1.6.0
Fitness
81.9/100
Pass rate
91.8%
Evals 24h
352
fitnesswarn 85 · swap 82
Hot-swap <
82
Warn <
85
Drift streak
0/4
Last auto-tune
15m ago
guardrail.strict↑ strict
falsetrue
Hot-swap history
0
No swaps. The current version is meeting its thresholds.
soulhealthy
mckinsey-consultant@2.3.0
Fitness
86.2/100
Pass rate
88.3%
Evals 24h
905
fitnesswarn 81 · swap 78
Hot-swap <
78
Warn <
81
Drift streak
0/4
Last auto-tune
No auto-tune yet — fitness has stayed above the warn line.
Hot-swap history
0
No swaps. The current version is meeting its thresholds.
Want this on your stack?
Connect via MCP and AgentForge starts the evaluation loop automatically.