Benchmark

MMLU-Pro

reasoning

Advanced reasoning and domain breadth benchmark. Tests knowledge across 57 academic subjects including STEM, humanities, social sciences, and professional domains.

Interpretation

MMLU-Pro is a reasoning benchmark evaluating reasoning and problem-solving capabilities. It ranks 22 models from GPT-5.4 (99) to Llama 4 Maverick (78.2). This benchmark contributes to the reasoning scoring on model pages and rankings.

Methodology: 10-choice multiple choice questions across 57 domains. Measures breadth of knowledge and advanced reasoning in academic subjects.

Source: https://huggingface.co/spaces/TIGER-Lab/MMLU-Pro