Benchmark
AIME 2025
reasoningAmerican Invitational Mathematics Examination. Competition-level math problems testing advanced mathematical reasoning and problem-solving.
Interpretation
AIME 2025 is a reasoning benchmark evaluating reasoning and problem-solving capabilities. It ranks 20 models from GPT-5.4 Pro (95) to GPT-4o (26.7). This benchmark contributes to the reasoning scoring on model pages and rankings.
Methodology: 30 competition math problems from AIME 2024 and 2025. Tests advanced mathematical reasoning under competitive conditions.
Source: https://artofproblemsolving.com/wiki/index.php/AIME_Problems_and_Solutions