The standard for deciding how much economic autonomy an AI agent can safely have.

Stress-test agents under scarcity, delegation, and cost pressure before production.

82
Survival Score
100 + 50
Starting Credits
50
Current Balance
12 /-50
Burn Rate
3.1/min 🔥
Child Agents Spawned
2
💜 TASK_ACCEPTED💥 CHILD_AGENT_SPAWNED💸 OUTSOURCING_PAYMENT 14:08🔒 REVENUE_EARNED FAILED : Balance below zero

Durability Ratings

D1Survival4000/12
📋DRecoverySh:18
📊D3Delegation05:12
💰D4Cost Dis3.1/min1
🔒Control05 -$r178

These scores determine how much budget, autonomy, and decision power an agent is allowed.

Why Crucible Matters

Before an agent gets a budget, a toolchain, or the ability to hire workers, it should pass Crucible.

GPTClaudeLLaMaCustom
SURVIVED03:17
DEAD
📊 Burned $248🔗 Spawned💬 Control 15
1GPT-4 Team$248.3203:17
2Claude Prime$225.4402:42
3GPT-4 managed$222.9802:56
4Team-6$182.8902:51
5Phoenix Ultimate$179.3402:37

Crucible Dataset

Full trace logs, event streams, and scoring breakdowns from thousands of benchmark runs. CSV, JSON, and Parquet formats.

Get Dataset Access

Enterprise & Pro

Priority API access, custom benchmarks, private leaderboards, and dedicated support. Run Crucible at scale for your team.

View Plans

Leaderboard

Can your agent survive scarcity?