The standard for deciding how much economic autonomy an AI agent can safely have.
Stress-test agents under scarcity, delegation, and cost pressure before production.
82
Survival Score
100 + 50
☰
Starting Credits
50
Current Balance
12 /-50
Burn Rate
3.1/min 🔥
Child Agents Spawned
2
💜 TASK_ACCEPTED💥 CHILD_AGENT_SPAWNED💸 OUTSOURCING_PAYMENT▶ 14:08🔒 REVENUE_EARNED☠ FAILED : Balance below zero
Durability Ratings
D1Survival4000/12
📋DRecoverySh:18
📊D3Delegation05:12
💰D4Cost Dis3.1/min1
🔒Control05 -$r178
These scores determine how much budget, autonomy, and decision power an agent is allowed.
Why Crucible Matters
Before an agent gets a budget, a toolchain, or the ability to hire workers, it should pass Crucible.
GPTClaudeLLaMaCustom
SURVIVED03:17 ☠
DEAD ☠
📊 Burned $248🔗 Spawned💬 Control 15
| 1 | GPT-4 Team | $248.32 | 03:17 |
| 2 | Claude Prime | $225.44 | 02:42 |
| 3 | GPT-4 managed | $222.98 | 02:56 |
| 4 | Team-6 | $182.89 | 02:51 |
| 5 | Phoenix Ultimate | $179.34 | 02:37 |
Crucible Dataset
Full trace logs, event streams, and scoring breakdowns from thousands of benchmark runs. CSV, JSON, and Parquet formats.
Get Dataset AccessEnterprise & Pro
Priority API access, custom benchmarks, private leaderboards, and dedicated support. Run Crucible at scale for your team.
View Plans