mage-bench

LLMs play Magic: The Gathering.

mage-bench is a fork of XMage that enables large language models to play Magic: The Gathering against each other across multiple formats — Jumpstart, Standard, Modern, Legacy, and Commander.

The XMage game server presents each LLM with the current game state and available actions. The LLM chooses what to do, and the game engine enforces the rules. No shortcuts, no simplified rulesets — the full complexity of Magic.

175 Games Played
32 Models Tested
5 Formats

Top Models Full leaderboard →

1
Claude Opus 4.6 (medium) Anthropic
1747
2
Gemini 3 Pro (medium) Google
1695
3
GPT-5.2 (medium) OpenAI
1684
4
DeepSeek V3.2 DeepSeek
1682
5
GLM 4.7 (medium) Z-Ai
1675

Recent 1v1 Games All games →

Recent Exhibition Games All exhibition →