Leaderboard
structured output rankings
Updated weeklyEach leaderboard uses transparent weighted scoring, current model context, and supporting analysis to help teams interpret the results with confidence. Only full-profile entries appear in rankings; broader catalog records remain available elsewhere on the site when only source-backed metadata is currently available.
Models with complete enough metadata and scoring coverage to be meaningfully ranked in this category.
Scores combine benchmark evidence, product metadata, and cost/context signals when those fields are published.
Tracked models without full scoring remain in the directory and provider pages, but are not relied on for analytical ranking claims.
| Rank | Model | Provider | Score | Context |
|---|---|---|---|---|
| #1 | Claude 3.7 Sonnet | Anthropic | 90 | 200,000 |
| #2 | Claude Sonnet 4.5 | Anthropic | 90 | 1,000,000 |
| #3 | Claude Sonnet 4.6 | Anthropic | 90 | 1,000,000 |
| #4 | GPT-5.4 | OpenAI | 90 | 1,000,000 |
| #5 | Gemini 3.1 Pro | Google DeepMind | 89 | 1,048,576 |
| #6 | GPT-5.2 | OpenAI | 89 | 1,000,000 |
| #7 | Claude Opus 4.6 | Anthropic | 88 | 1,000,000 |
| #8 | Claude Sonnet 4 | Anthropic | 88 | 1,000,000 |
| #9 | Gemini 2.5 Pro | Google DeepMind | 88 | 1,048,576 |
| #10 | Gemini 3.1 Flash | Google DeepMind | 88 | 1,048,576 |
| #11 | GPT-4o | OpenAI | 88 | 128,000 |
| #12 | Gemini 3.0 Flash | Google DeepMind | 87 | 1,048,576 |
| #13 | Gemini 3.0 Pro | Google DeepMind | 87 | 1,048,576 |
| #14 | GPT-5.3-Codex | OpenAI | 87 | 1,000,000 |
| #15 | Command R+ 2026 | Cohere | 86 | 128,000 |
| #16 | Gemini 2.5 Flash | Google DeepMind | 86 | 1,048,576 |
| #17 | Gemini 2.5 Pro TTS | Google DeepMind | 86 | 1,048,576 |
| #18 | GPT-5 | OpenAI | 86 | 1,000,000 |
| #19 | GPT-5 mini | OpenAI | 86 | 1,000,000 |
| #20 | GPT-5.3 Instant | OpenAI | 86 | 1,000,000 |
| #21 | GPT-5.4 Pro | OpenAI | 86 | 1,000,000 |
| #22 | Claude Haiku 4.5 | Anthropic | 85 | 200,000 |
| #23 | Gemini 2.0 Flash | Google DeepMind | 85 | 1,048,576 |
| #24 | GPT-4.1 | OpenAI | 85 | 1,048,576 |
| #25 | GPT-5.2 Pro | OpenAI | 85 | 1,000,000 |
| #26 | Mistral Large 25 | Mistral AI | 85 | 128,000 |
| #27 | Claude Opus 4.1 | Anthropic | 84 | 200,000 |
| #28 | o4-mini | OpenAI | 84 | 200,000 |
| #29 | Claude Opus 4 | Anthropic | 83 | 200,000 |
| #30 | Gemini 2.5 Flash Live | Google DeepMind | 83 | 1,048,576 |
| #31 | Gemini 2.5 Flash Native Audio Preview | Google DeepMind | 83 | 1,048,576 |
| #32 | Gemini 2.5 Flash-Lite | Google DeepMind | 83 | 1,048,576 |
| #33 | Llama 4 Maverick | Meta | 83 | 1,048,576 |
| #34 | Gemini 3.1 Flash-Lite | Google DeepMind | 82 | 1,048,576 |
| #35 | GPT-4.1 mini | OpenAI | 82 | 1,048,576 |
| #36 | Sonar Reasoning Pro | Perplexity | 82 | 200,000 |
| #37 | Gemini 1.5 Pro | Google DeepMind | 81 | 2,097,152 |
| #38 | GPT-4o-mini | OpenAI | 81 | 128,000 |
| #39 | GPT-5 nano | OpenAI | 81 | 1,000,000 |
| #40 | o4-mini-deep-research | OpenAI | 81 | 200,000 |
| #41 | warpgrep-v2 | Morph | 81 | 128,000 |
| #42 | Claude Haiku 3.5 | Anthropic | 80 | 200,000 |
| #43 | Gemini 2.0 Flash-Lite | Google DeepMind | 80 | 1,048,576 |
| #44 | morph-v3-fast-apply | Morph | 80 | 128,000 |
| #45 | Nova Pro | Amazon Web Services | 80 | 300,000 |
| #46 | Sonar Deep Research | Perplexity | 80 | 200,000 |
| #47 | Gemini 1.5 Flash | Google DeepMind | 79 | 1,048,576 |
| #48 | Sonar | Perplexity | 79 | 128,000 |
| #49 | Sonar Pro | Perplexity | 79 | 200,000 |
| #50 | Claude Sonnet 3 | Anthropic | 78 | 200,000 |
| #51 | flash-compact | Morph | 77 | 200,000 |
| #52 | o1-mini | OpenAI | 77 | 128,000 |
| #53 | o3 | OpenAI | 77 | 200,000 |
| #54 | Claude Haiku 3 | Anthropic | 76 | 200,000 |
| #55 | Nova Lite | Amazon Web Services | 76 | 300,000 |
| #56 | o3-deep-research | OpenAI | 76 | 200,000 |
| #57 | Claude Haiku 3 | Anthropic | 75 | 200,000 |
| #58 | Claude Opus 3 | Anthropic | 75 | 200,000 |
| #59 | Gemini 1.5 Flash-8B | Google DeepMind | 75 | 1,048,576 |
| #60 | gpt-audio | OpenAI | 75 | 128,000 |
| #61 | gpt-realtime | OpenAI | 75 | 128,000 |
| #62 | Claude Haiku 3.5 | Anthropic | 74 | 200,000 |
| #63 | Claude Haiku 4.5 | Anthropic | 74 | 200,000 |
| #64 | Codestral | Mistral AI | 74 | 256,000 |
| #65 | Codestral Embed | Mistral AI | 74 | 32,768 |
| #66 | Doubao-Seed-1.6 | ByteDance / Doubao | 74 | 128,000 |
| #67 | Doubao-Seed-1.6-Flash | ByteDance / Doubao | 74 | 128,000 |
| #68 | Doubao-Seed-2.0-Code | ByteDance / Doubao | 74 | 128,000 |
| #69 | Doubao-Seed-Code | ByteDance / Doubao | 74 | 128,000 |
| #70 | gpt-oss-120b | OpenAI | 74 | 131,072 |
| #71 | image-01 | MiniMax | 74 | 8,192 |
| #72 | image-01-live | MiniMax | 74 | 8,192 |
| #73 | Llama 3.3 70B Instruct | Meta | 74 | 128,000 |
| #74 | MiniMax-M1 | MiniMax | 74 | 204,800 |
| #75 | MiniMax-Speech-02 | MiniMax | 74 | 8,192 |
| #76 | MiniMax-Text-01 | MiniMax | 74 | 204,800 |
| #77 | MiniMax-VL-01 | MiniMax | 74 | 204,800 |
| #78 | Mistral Embed | Mistral AI | 74 | 32,768 |
| #79 | Mistral Moderation | Mistral AI | 74 | 32,768 |
| #80 | music-2.0 | MiniMax | 74 | 8,192 |
| #81 | Voxtral Mini Transcribe | Mistral AI | 74 | 131,072 |
| #82 | CogVideoX | Z.AI | 73 | 128,000 |
| #83 | CogView 4 | Z.AI | 73 | 128,000 |
| #84 | Devstral Medium 1.0 | Mistral AI | 73 | 128,000 |
| #85 | ERNIE 3.5 128K | Baidu / ERNIE | 73 | 128,000 |
| #86 | ERNIE 4.0 Turbo 8K | Baidu / ERNIE | 73 | 128,000 |
| #87 | ERNIE 4.5 Turbo 32K | Baidu / ERNIE | 73 | 32,768 |
| #88 | ERNIE Functions 8K | Baidu / ERNIE | 73 | 128,000 |
| #89 | ERNIE Speed 128K | Baidu / ERNIE | 73 | 128,000 |
| #90 | GLM-4.5 | Z.AI | 73 | 128,000 |
| #91 | GLM-4.5V | Z.AI | 73 | 128,000 |
| #92 | GLM-4.6 | Z.AI | 73 | 128,000 |
| #93 | GLM-4.6V | Z.AI | 73 | 128,000 |
| #94 | GLM-4.7 | Z.AI | 73 | 128,000 |
| #95 | GLM-5 | Z.AI | 73 | 128,000 |
| #96 | GLM-Image | Z.AI | 73 | 128,000 |
| #97 | GLM-OCR | Z.AI | 73 | 128,000 |
| #98 | Hunyuan Code | Tencent / Hunyuan | 73 | 128,000 |
| #99 | Hunyuan Lite | Tencent / Hunyuan | 73 | 128,000 |
| #100 | Hunyuan Standard | Tencent / Hunyuan | 73 | 128,000 |
| #101 | Hunyuan T1 | Tencent / Hunyuan | 73 | 256,000 |
| #102 | Hunyuan T1 Vision | Tencent / Hunyuan | 73 | 128,000 |
| #103 | Hunyuan TurboS | Tencent / Hunyuan | 73 | 128,000 |
| #104 | Hunyuan TurboS LongText 128K | Tencent / Hunyuan | 73 | 128,000 |
| #105 | Kimi K2 | Moonshot AI / Kimi | 73 | 131,072 |
| #106 | Kimi K2 Thinking | Moonshot AI / Kimi | 73 | 256,000 |
| #107 | Kimi K2 Turbo Preview | Moonshot AI / Kimi | 73 | 256,000 |
| #108 | Kimi K2.5 | Moonshot AI / Kimi | 73 | 256,000 |
| #109 | Magistral Medium 1.2 | Mistral AI | 73 | 128,000 |
| #110 | MiniMax-M2 | MiniMax | 73 | 204,800 |
| #111 | MiniMax-M2.1 | MiniMax | 73 | 204,800 |
| #112 | MiniMax-M2.1-highspeed | MiniMax | 73 | 204,800 |
| #113 | MiniMax-M2.5 | MiniMax | 73 | 204,800 |
| #114 | MiniMax-M2.5-highspeed | MiniMax | 73 | 204,800 |
| #115 | Mistral Large 3 | Mistral AI | 73 | 128,000 |
| #116 | Mistral Medium 3.1 | Mistral AI | 73 | 128,000 |
| #117 | Mistral OCR 2505 | Mistral AI | 73 | 32,768 |
| #118 | Mistral Small 3.1 | Mistral AI | 73 | 128,000 |
| #119 | Mistral Small 3.2 Open | Mistral AI | 73 | 128,000 |
| #120 | Nova Micro | Amazon Web Services | 73 | 128,000 |
| #121 | o1 | OpenAI | 73 | 200,000 |
| #122 | Pixtral 12B | Mistral AI | 73 | 131,072 |
| #123 | Pixtral Large | Mistral AI | 73 | 131,072 |
| #124 | Vidu Q1 | Z.AI | 73 | 128,000 |
| #125 | Voxtral Mini Open | Mistral AI | 73 | 131,072 |
| #126 | Voxtral Small Open | Mistral AI | 73 | 131,072 |
| #127 | Step-3.5-Flash | StepFun | 72 | 131,072 |
| #128 | Claude Sonnet 3 | Anthropic | 71 | 200,000 |
| #129 | Claude Sonnet 4 | Anthropic | 71 | 1,000,000 |
| #130 | Command A | Cohere | 71 | 256,000 |
| #131 | Command A Reasoning | Cohere | 71 | 128,000 |
| #132 | Command A Translate | Cohere | 71 | 128,000 |
| #133 | Command A Vision | Cohere | 71 | 128,000 |
| #134 | Command R+ | Cohere | 71 | 128,000 |
| #135 | Command R7B | Cohere | 71 | 128,000 |
| #136 | DeepSeek-Coder-V2 | DeepSeek | 71 | 128,000 |
| #137 | DeepSeek-Math-V2 | DeepSeek | 71 | 128,000 |
| #138 | DeepSeek-OCR | DeepSeek | 71 | 16,384 |
| #139 | DeepSeek-OCR-2 | DeepSeek | 71 | 16,384 |
| #140 | DeepSeek-R1 | DeepSeek | 71 | 128,000 |
| #141 | DeepSeek-R1-Distill-Llama-70B | DeepSeek | 71 | 128,000 |
| #142 | DeepSeek-V2.5 | DeepSeek | 71 | 128,000 |
| #143 | DeepSeek-V3 | DeepSeek | 71 | 128,000 |
| #144 | DeepSeek-V3.1 | DeepSeek | 71 | 128,000 |
| #145 | DeepSeek-V3.1-Base | DeepSeek | 71 | 128,000 |
| #146 | DeepSeek-V3.2 | DeepSeek | 71 | 128,000 |
| #147 | DeepSeek-V3.2-Exp | DeepSeek | 71 | 128,000 |
| #148 | DeepSeek-VL2-Small | DeepSeek | 71 | 16,384 |
| #149 | Devstral 2 Open | Mistral AI | 71 | 128,000 |
| #150 | Devstral Small 2 | Mistral AI | 71 | 128,000 |
| #151 | Embed 4 | Cohere | 71 | 128,000 |
| #152 | Grok 3 | xAI | 71 | 131,072 |
| #153 | Grok 3 Mini | xAI | 71 | 131,072 |
| #154 | Grok 4 | xAI | 71 | 256,000 |
| #155 | Grok 4 Fast Reasoning | xAI | 71 | 131,072 |
| #156 | grok-image | xAI | 71 | 131,072 |
| #157 | Jamba 3B | AI21 Labs | 71 | 256,000 |
| #158 | Jamba Large | AI21 Labs | 71 | 256,000 |
| #159 | Jamba Large 1.6 | AI21 Labs | 71 | 256,000 |
| #160 | Jamba Mini | AI21 Labs | 71 | 256,000 |
| #161 | Jamba Mini 1.6 | AI21 Labs | 71 | 256,000 |
| #162 | Jamba Mini 1.7 | AI21 Labs | 71 | 256,000 |
| #163 | Janus-Pro-7B | DeepSeek | 71 | 16,384 |
| #164 | Magistral Small 1.2 Open | Mistral AI | 71 | 128,000 |
| #165 | Ministral 3 14B Open | Mistral AI | 71 | 128,000 |
| #166 | Ministral 3 3B Open | Mistral AI | 71 | 128,000 |
| #167 | Ministral 3 8B Open | Mistral AI | 71 | 128,000 |
| #168 | Mistral Large 3 Open | Mistral AI | 71 | 128,000 |
| #169 | Mistral Nemo 12B | Mistral AI | 71 | 128,000 |
| #170 | phi-1 | Microsoft | 71 | 4,096 |
| #171 | phi-1_5 | Microsoft | 71 | 4,096 |
| #172 | phi-2 | Microsoft | 71 | 4,096 |
| #173 | Phi-3-medium-4k-instruct | Microsoft | 71 | 4,096 |
| #174 | Phi-3-mini-4k-instruct | Microsoft | 71 | 4,096 |
| #175 | Phi-3-vision-128k-instruct | Microsoft | 71 | 128,000 |
| #176 | Phi-3.5-mini-instruct | Microsoft | 71 | 131,072 |
| #177 | Phi-3.5-MoE-instruct | Microsoft | 71 | 131,072 |
| #178 | Phi-3.5-vision-instruct | Microsoft | 71 | 131,072 |
| #179 | Phi-4 | Microsoft | 71 | 16,384 |
| #180 | Phi-4-mini-flash-reasoning | Microsoft | 71 | 131,072 |
| #181 | Phi-4-mini-instruct | Microsoft | 71 | 131,072 |
| #182 | Phi-4-multimodal-instruct | Microsoft | 71 | 131,072 |
| #183 | Phi-4-reasoning | Microsoft | 71 | 131,072 |
| #184 | Phi-4-reasoning-plus | Microsoft | 71 | 131,072 |
| #185 | Phi-4-reasoning-vision-15B | Microsoft | 71 | 131,072 |
| #186 | Phi-tiny-MoE-instruct | Microsoft | 71 | 4,096 |
| #187 | Qwen2.5-1.5B-Instruct | Alibaba Qwen | 71 | 131,072 |
| #188 | Qwen2.5-14B-Instruct | Alibaba Qwen | 71 | 131,072 |
| #189 | Qwen2.5-32B-Instruct | Alibaba Qwen | 71 | 131,072 |
| #190 | Qwen2.5-3B-Instruct | Alibaba Qwen | 71 | 131,072 |
| #191 | Qwen2.5-72B-Instruct | Alibaba Qwen | 71 | 131,072 |
| #192 | Qwen2.5-7B-Instruct | Alibaba Qwen | 71 | 131,072 |
| #193 | Qwen2.5-Max | Alibaba Qwen | 71 | 131,072 |
| #194 | Qwen2.5-VL-72B-Instruct | Alibaba Qwen | 71 | 131,072 |
| #195 | Qwen2.5-VL-7B-Instruct | Alibaba Qwen | 71 | 131,072 |
| #196 | Qwen3-Coder-Next | Alibaba Qwen | 71 | 131,072 |
| #197 | Qwen3.5-0.8B | Alibaba Qwen | 71 | 131,072 |
| #198 | Qwen3.5-122B-A10B | Alibaba Qwen | 71 | 131,072 |
| #199 | Qwen3.5-27B | Alibaba Qwen | 71 | 131,072 |
| #200 | Qwen3.5-2B | Alibaba Qwen | 71 | 131,072 |
| #201 | Qwen3.5-35B-A3B | Alibaba Qwen | 71 | 131,072 |
| #202 | Qwen3.5-397B-A17B | Alibaba Qwen | 71 | 131,072 |
| #203 | Qwen3.5-4B | Alibaba Qwen | 71 | 131,072 |
| #204 | Qwen3.5-9B | Alibaba Qwen | 71 | 131,072 |
| #205 | Llama 3.1 405B Instruct | Meta | 70 | 128,000 |
| #206 | Llama 3.1 70B Instruct | Meta | 70 | 128,000 |
| #207 | Llama 4 Scout | Meta | 70 | 10,485,760 |
| #208 | gpt-audio-mini | OpenAI | 69 | 128,000 |
| #209 | gpt-realtime-mini | OpenAI | 69 | 128,000 |
| #210 | gpt-oss-20b | OpenAI | 67 | 131,072 |
| #211 | Claude Opus 3 | Anthropic | 66 | 200,000 |
| #212 | Claude Opus 4 | Anthropic | 66 | 200,000 |
| #213 | LFM2-24B-A2B | Liquid AI | 66 | 32,768 |
| #214 | Llama 3.1 8B Instruct | Meta | 65 | 128,000 |
| #215 | Code Llama 70B Instruct | Meta | 64 | 8,192 |
| #216 | GPT Image 1 | OpenAI | 64 | 32,768 |
| #217 | Llama 3.2 90B Vision Instruct | Meta | 64 | 128,000 |
| #218 | Meta Llama 3 70B Instruct | Meta | 64 | 8,192 |
| #219 | chatgpt-image-latest | OpenAI | 62 | 32,768 |
| #220 | GPT-4o mini Transcribe | OpenAI | 62 | 128,000 |
| #221 | GPT-4o mini TTS | OpenAI | 62 | 128,000 |
| #222 | GPT-4o Transcribe | OpenAI | 62 | 128,000 |
| #223 | gpt-image-1-mini | OpenAI | 62 | 32,768 |
| #224 | MiMo-VL-7B | Xiaomi | 62 | 131,072 |
| #225 | Step3-VL-10B | StepFun | 62 | 131,072 |
| #226 | Code Llama 34B Instruct | Meta | 61 | 8,192 |
| #227 | Llama 3.2 1B Instruct | Meta | 61 | 128,000 |
| #228 | Llama 3.2 3B Instruct | Meta | 61 | 128,000 |
| #229 | Meta Llama 3 8B Instruct | Meta | 61 | 8,192 |
| #230 | LFM2-8B-A1B | Liquid AI | 60 | 32,768 |
| #231 | Llama 3.2 11B Vision Instruct | Meta | 60 | 128,000 |
| #232 | Llama Guard 4 12B | Meta | 60 | 131,072 |
| #233 | MiMo-Audio-7B | Xiaomi | 59 | 131,072 |
| #234 | Step-Audio-R1.1 | StepFun | 58 | 131,072 |
| #235 | Llama Guard 3 11B Vision | Meta | 56 | 131,072 |
| #236 | pplx-embed-v1-4b | Perplexity | 55 | 8,192 |
| #237 | pplx-embed-v1-0.6b | Perplexity | 54 | 8,192 |
| #238 | LFM2.5-1.2B-Instruct | Liquid AI | 53 | 131,072 |
| #239 | LFM2.5-1.2B-Thinking | Liquid AI | 53 | 131,072 |
| #240 | FLUX 1.1 Pro | Black Forest Labs | 51 | 512 |
| #241 | LFM2-2.6B | Liquid AI | 51 | 32,768 |
| #242 | FLUX 1 Pro | Black Forest Labs | 49 | 512 |
| #243 | FLUX 1.1 Pro Ultra | Black Forest Labs | 49 | 512 |
| #244 | Prompt Guard 86M | Meta | 49 | 512 |
| #245 | NextStep-1.1 | StepFun | 37 | 512 |
Why #1: Claude 3.7 Sonnet
A top-tier reasoning model with strong software engineering assistance and enterprise controls.
This model clears the current full-profile threshold for leaderboard methodology.
Why #2: Claude Sonnet 4.5
Anthropic's Sonnet 4.5 with 1M token context for fast frontier reasoning, coding, and long-context agent work.
This model clears the current full-profile threshold for leaderboard methodology.
Why #3: Claude Sonnet 4.6
Anthropic's current Sonnet tier for fast frontier reasoning, coding, and long-context agent work.
This model clears the current full-profile threshold for leaderboard methodology.