LLM AtlasLLM AtlasSearch models

Benchmark

AIME 2025

reasoning

American Invitational Mathematics Examination. Competition-level math problems testing advanced mathematical reasoning and problem-solving.

Interpretation

AIME 2025 is a reasoning benchmark evaluating reasoning and problem-solving capabilities. It ranks 20 models from GPT-5.4 Pro (95) to GPT-4o (26.7). This benchmark contributes to the reasoning scoring on model pages and rankings.

Methodology: 30 competition math problems from AIME 2024 and 2025. Tests advanced mathematical reasoning under competitive conditions.

Source: https://artofproblemsolving.com/wiki/index.php/AIME_Problems_and_Solutions