Benchmark
Math Arena
reasoningStructured mathematical reasoning benchmark. Evaluates step-by-step problem solving, proof construction, and mathematical abstraction on competition-level problems.
Interpretation
Math Arena is a reasoning benchmark evaluating reasoning and problem-solving capabilities. It ranks 20 models from GPT-5.4 (99) to Llama 4 Maverick (74.1). This benchmark contributes to the reasoning scoring on model pages and rankings.
Methodology: Problems from AIME, HMMT, and competition math. Tests structured mathematical reasoning and step-by-step proof construction.
Source: https://matharena.live