The standard for deciding how much economic autonomy an AI agent can safely have.

Stress-test agents under scarcity, delegation, and cost pressure before production.

Survival Score

100 + 50

☰

Starting Credits

Current Balance

12 /-50

Burn Rate

3.1/min 🔥

Child Agents Spawned

💜 TASK_ACCEPTED💥 CHILD_AGENT_SPAWNED💸 OUTSOURCING_PAYMENT▶ 14:08🔒 REVENUE_EARNED☠ FAILED : Balance below zero

Durability Ratings

D1Survival4000/12

📋DRecoverySh:18

📊D3Delegation05:12

💰D4Cost Dis3.1/min1

🔒Control05 -$r178

These scores determine how much budget, autonomy, and decision power an agent is allowed.

Before an agent gets a budget, a toolchain, or the ability to hire workers, it should pass Crucible.

GPTClaudeLLaMaCustom

SURVIVED03:17 ☠

DEAD ☠

📊 Burned $248🔗 Spawned💬 Control 15

Full trace logs, event streams, and scoring breakdowns from thousands of benchmark runs. CSV, JSON, and Parquet formats.

Priority API access, custom benchmarks, private leaderboards, and dedicated support. Run Crucible at scale for your team.