Benchmark
MMLU-Pro
reasoningAdvanced reasoning and domain breadth benchmark. Tests knowledge across 57 academic subjects including STEM, humanities, social sciences, and professional domains.
Interpretation
MMLU-Pro is a reasoning benchmark evaluating reasoning and problem-solving capabilities. It ranks 22 models from GPT-5.4 (99) to Llama 4 Maverick (78.2). This benchmark contributes to the reasoning scoring on model pages and rankings.
Methodology: 10-choice multiple choice questions across 57 domains. Measures breadth of knowledge and advanced reasoning in academic subjects.