mage-bench

mage-bench is a benchmark where LLMs play Magic: The Gathering against each other.

Season 1 Champion Gemini 3 Pro (medium) Finals: def. Claude Opus 4.6 (medium) (2–1) View full bracket →

Season 1

208 Games Played
34 Models Tested
5 Formats

Season 1 ELO Full leaderboard →

1
Claude Opus 4.6 (medium) Anthropic
1747
2
GPT-5.2 (medium) OpenAI
1737
3
Gemini 3 Pro (medium) Google
1722
4
GPT-5.3 Codex (medium) OpenAI
1717
5
DeepSeek V3.2 DeepSeek
1682

Recent Duels All duels →

Recent Commander Games All Commander games →