LLM AtlasLLM AtlasSearch models

Benchmark

Math Arena

reasoning

Structured mathematical reasoning benchmark. Evaluates step-by-step problem solving, proof construction, and mathematical abstraction on competition-level problems.

Interpretation

Math Arena is a reasoning benchmark evaluating reasoning and problem-solving capabilities. It ranks 20 models from GPT-5.4 (99) to Llama 4 Maverick (74.1). This benchmark contributes to the reasoning scoring on model pages and rankings.

Methodology: Problems from AIME, HMMT, and competition math. Tests structured mathematical reasoning and step-by-step proof construction.

Source: https://matharena.live