Leaderboard
safety rankings
Updated weeklyEach leaderboard uses transparent weighted scoring, current model context, and supporting analysis to help teams interpret the results with confidence. Only full-profile entries appear in rankings; broader catalog records remain available elsewhere on the site when only source-backed metadata is currently available.
Models with complete enough metadata and scoring coverage to be meaningfully ranked in this category.
Scores combine benchmark evidence, product metadata, and cost/context signals when those fields are published.
Tracked models without full scoring remain in the directory and provider pages, but are not relied on for analytical ranking claims.
| Rank | Model | Provider | Score | Context |
|---|---|---|---|---|
| #1 | Claude Opus 4.6 | Anthropic | 92 | 1,000,000 |
| #2 | Claude Sonnet 4.5 | Anthropic | 92 | 1,000,000 |
| #3 | Claude Sonnet 4.6 | Anthropic | 92 | 1,000,000 |
| #4 | Claude 3.7 Sonnet | Anthropic | 91 | 200,000 |
| #5 | Claude Opus 4.1 | Anthropic | 90 | 200,000 |
| #6 | Claude Sonnet 4 | Anthropic | 90 | 1,000,000 |
| #7 | Claude Opus 4 | Anthropic | 89 | 200,000 |
| #8 | Command R+ 2026 | Cohere | 89 | 128,000 |
| #9 | Gemini 2.5 Pro | Google DeepMind | 88 | 1,048,576 |
| #10 | GPT-4o | OpenAI | 88 | 128,000 |
| #11 | GPT-5.4 | OpenAI | 88 | 1,000,000 |
| #12 | Claude Haiku 4.5 | Anthropic | 87 | 200,000 |
| #13 | Gemini 3.1 Pro | Google DeepMind | 86 | 1,048,576 |
| #14 | GPT-5.2 | OpenAI | 86 | 1,000,000 |
| #15 | GPT-5.4 Pro | OpenAI | 86 | 1,000,000 |
| #16 | Gemini 3.0 Pro | Google DeepMind | 85 | 1,048,576 |
| #17 | Gemini 3.1 Flash | Google DeepMind | 85 | 1,048,576 |
| #18 | GPT-5 | OpenAI | 85 | 1,000,000 |
| #19 | GPT-5.2 Pro | OpenAI | 85 | 1,000,000 |
| #20 | GPT-5.3 Instant | OpenAI | 85 | 1,000,000 |
| #21 | Claude Opus 3 | Anthropic | 84 | 200,000 |
| #22 | Claude Sonnet 3 | Anthropic | 84 | 200,000 |
| #23 | Gemini 2.0 Flash | Google DeepMind | 84 | 1,048,576 |
| #24 | Gemini 2.5 Flash | Google DeepMind | 84 | 1,048,576 |
| #25 | Gemini 2.5 Pro TTS | Google DeepMind | 84 | 1,048,576 |
| #26 | Gemini 3.0 Flash | Google DeepMind | 84 | 1,048,576 |
| #27 | GPT-5 mini | OpenAI | 84 | 1,000,000 |
| #28 | GPT-5.3-Codex | OpenAI | 84 | 1,000,000 |
| #29 | Mistral Large 25 | Mistral AI | 84 | 128,000 |
| #30 | Sonar Deep Research | Perplexity | 84 | 200,000 |
| #31 | Claude Haiku 3.5 | Anthropic | 83 | 200,000 |
| #32 | Gemini 2.5 Flash Live | Google DeepMind | 83 | 1,048,576 |
| #33 | Gemini 2.5 Flash Native Audio Preview | Google DeepMind | 83 | 1,048,576 |
| #34 | Gemini 2.5 Flash-Lite | Google DeepMind | 83 | 1,048,576 |
| #35 | Gemini 3.1 Flash-Lite | Google DeepMind | 83 | 1,048,576 |
| #36 | GPT-4.1 | OpenAI | 83 | 1,048,576 |
| #37 | GPT-4o-mini | OpenAI | 83 | 128,000 |
| #38 | Llama Guard 4 12B | Meta | 83 | 131,072 |
| #39 | Nova Pro | Amazon Web Services | 83 | 300,000 |
| #40 | o4-mini | OpenAI | 83 | 200,000 |
| #41 | Sonar Reasoning Pro | Perplexity | 83 | 200,000 |
| #42 | GPT-4.1 mini | OpenAI | 82 | 1,048,576 |
| #43 | GPT-5 nano | OpenAI | 82 | 1,000,000 |
| #44 | Sonar Pro | Perplexity | 82 | 200,000 |
| #45 | Claude Haiku 3 | Anthropic | 81 | 200,000 |
| #46 | Gemini 1.5 Pro | Google DeepMind | 81 | 2,097,152 |
| #47 | Gemini 2.0 Flash-Lite | Google DeepMind | 81 | 1,048,576 |
| #48 | Llama 4 Maverick | Meta | 81 | 1,048,576 |
| #49 | Nova Lite | Amazon Web Services | 81 | 300,000 |
| #50 | o4-mini-deep-research | OpenAI | 81 | 200,000 |
| #51 | Sonar | Perplexity | 81 | 128,000 |
| #52 | Gemini 1.5 Flash | Google DeepMind | 80 | 1,048,576 |
| #53 | Llama Guard 3 11B Vision | Meta | 80 | 131,072 |
| #54 | o1-mini | OpenAI | 80 | 128,000 |
| #55 | o3 | OpenAI | 80 | 200,000 |
| #56 | o3-deep-research | OpenAI | 80 | 200,000 |
| #57 | morph-v3-fast-apply | Morph | 79 | 128,000 |
| #58 | Nova Micro | Amazon Web Services | 79 | 128,000 |
| #59 | warpgrep-v2 | Morph | 79 | 128,000 |
| #60 | Gemini 1.5 Flash-8B | Google DeepMind | 78 | 1,048,576 |
| #61 | gpt-audio | OpenAI | 78 | 128,000 |
| #62 | gpt-realtime | OpenAI | 78 | 128,000 |
| #63 | o1 | OpenAI | 78 | 200,000 |
| #64 | flash-compact | Morph | 77 | 200,000 |
| #65 | Claude Haiku 3 | Anthropic | 76 | 200,000 |
| #66 | Codestral | Mistral AI | 76 | 256,000 |
| #67 | GPT Image 1 | OpenAI | 76 | 32,768 |
| #68 | gpt-audio-mini | OpenAI | 76 | 128,000 |
| #69 | gpt-realtime-mini | OpenAI | 76 | 128,000 |
| #70 | Voxtral Mini Transcribe | Mistral AI | 76 | 131,072 |
| #71 | chatgpt-image-latest | OpenAI | 75 | 32,768 |
| #72 | Claude Haiku 3.5 | Anthropic | 75 | 200,000 |
| #73 | Claude Haiku 4.5 | Anthropic | 75 | 200,000 |
| #74 | Codestral Embed | Mistral AI | 75 | 32,768 |
| #75 | CogVideoX | Z.AI | 75 | 128,000 |
| #76 | CogView 4 | Z.AI | 75 | 128,000 |
| #77 | Devstral Medium 1.0 | Mistral AI | 75 | 128,000 |
| #78 | Doubao-Seed-1.6 | ByteDance / Doubao | 75 | 128,000 |
| #79 | Doubao-Seed-1.6-Flash | ByteDance / Doubao | 75 | 128,000 |
| #80 | Doubao-Seed-2.0-Code | ByteDance / Doubao | 75 | 128,000 |
| #81 | Doubao-Seed-Code | ByteDance / Doubao | 75 | 128,000 |
| #82 | ERNIE 3.5 128K | Baidu / ERNIE | 75 | 128,000 |
| #83 | ERNIE 4.0 Turbo 8K | Baidu / ERNIE | 75 | 128,000 |
| #84 | ERNIE Functions 8K | Baidu / ERNIE | 75 | 128,000 |
| #85 | ERNIE Speed 128K | Baidu / ERNIE | 75 | 128,000 |
| #86 | GLM-4.5 | Z.AI | 75 | 128,000 |
| #87 | GLM-4.5V | Z.AI | 75 | 128,000 |
| #88 | GLM-4.6 | Z.AI | 75 | 128,000 |
| #89 | GLM-4.6V | Z.AI | 75 | 128,000 |
| #90 | GLM-4.7 | Z.AI | 75 | 128,000 |
| #91 | GLM-5 | Z.AI | 75 | 128,000 |
| #92 | GLM-Image | Z.AI | 75 | 128,000 |
| #93 | GLM-OCR | Z.AI | 75 | 128,000 |
| #94 | gpt-image-1-mini | OpenAI | 75 | 32,768 |
| #95 | Hunyuan Code | Tencent / Hunyuan | 75 | 128,000 |
| #96 | Hunyuan Lite | Tencent / Hunyuan | 75 | 128,000 |
| #97 | Hunyuan Standard | Tencent / Hunyuan | 75 | 128,000 |
| #98 | Hunyuan T1 | Tencent / Hunyuan | 75 | 256,000 |
| #99 | Hunyuan T1 Vision | Tencent / Hunyuan | 75 | 128,000 |
| #100 | Hunyuan TurboS | Tencent / Hunyuan | 75 | 128,000 |
| #101 | Hunyuan TurboS LongText 128K | Tencent / Hunyuan | 75 | 128,000 |
| #102 | Kimi K2 | Moonshot AI / Kimi | 75 | 131,072 |
| #103 | Kimi K2 Thinking | Moonshot AI / Kimi | 75 | 256,000 |
| #104 | Kimi K2 Turbo Preview | Moonshot AI / Kimi | 75 | 256,000 |
| #105 | Kimi K2.5 | Moonshot AI / Kimi | 75 | 256,000 |
| #106 | Magistral Medium 1.2 | Mistral AI | 75 | 128,000 |
| #107 | MiniMax-M1 | MiniMax | 75 | 204,800 |
| #108 | MiniMax-M2 | MiniMax | 75 | 204,800 |
| #109 | MiniMax-M2.1 | MiniMax | 75 | 204,800 |
| #110 | MiniMax-M2.1-highspeed | MiniMax | 75 | 204,800 |
| #111 | MiniMax-M2.5 | MiniMax | 75 | 204,800 |
| #112 | MiniMax-M2.5-highspeed | MiniMax | 75 | 204,800 |
| #113 | MiniMax-Text-01 | MiniMax | 75 | 204,800 |
| #114 | MiniMax-VL-01 | MiniMax | 75 | 204,800 |
| #115 | Mistral Embed | Mistral AI | 75 | 32,768 |
| #116 | Mistral Large 3 | Mistral AI | 75 | 128,000 |
| #117 | Mistral Medium 3.1 | Mistral AI | 75 | 128,000 |
| #118 | Mistral Moderation | Mistral AI | 75 | 32,768 |
| #119 | Mistral Small 3.1 | Mistral AI | 75 | 128,000 |
| #120 | Mistral Small 3.2 Open | Mistral AI | 75 | 128,000 |
| #121 | Pixtral 12B | Mistral AI | 75 | 131,072 |
| #122 | Pixtral Large | Mistral AI | 75 | 131,072 |
| #123 | Vidu Q1 | Z.AI | 75 | 128,000 |
| #124 | Voxtral Mini Open | Mistral AI | 75 | 131,072 |
| #125 | Voxtral Small Open | Mistral AI | 75 | 131,072 |
| #126 | Claude Sonnet 3 | Anthropic | 74 | 200,000 |
| #127 | Claude Sonnet 4 | Anthropic | 74 | 1,000,000 |
| #128 | Command A | Cohere | 74 | 256,000 |
| #129 | Command A Reasoning | Cohere | 74 | 128,000 |
| #130 | Command A Translate | Cohere | 74 | 128,000 |
| #131 | Command A Vision | Cohere | 74 | 128,000 |
| #132 | Command R+ | Cohere | 74 | 128,000 |
| #133 | Command R7B | Cohere | 74 | 128,000 |
| #134 | DeepSeek-Coder-V2 | DeepSeek | 74 | 128,000 |
| #135 | DeepSeek-Math-V2 | DeepSeek | 74 | 128,000 |
| #136 | DeepSeek-R1 | DeepSeek | 74 | 128,000 |
| #137 | DeepSeek-R1-Distill-Llama-70B | DeepSeek | 74 | 128,000 |
| #138 | DeepSeek-V2.5 | DeepSeek | 74 | 128,000 |
| #139 | DeepSeek-V3 | DeepSeek | 74 | 128,000 |
| #140 | DeepSeek-V3.1 | DeepSeek | 74 | 128,000 |
| #141 | DeepSeek-V3.1-Base | DeepSeek | 74 | 128,000 |
| #142 | DeepSeek-V3.2 | DeepSeek | 74 | 128,000 |
| #143 | DeepSeek-V3.2-Exp | DeepSeek | 74 | 128,000 |
| #144 | Devstral 2 Open | Mistral AI | 74 | 128,000 |
| #145 | Devstral Small 2 | Mistral AI | 74 | 128,000 |
| #146 | Embed 4 | Cohere | 74 | 128,000 |
| #147 | ERNIE 4.5 Turbo 32K | Baidu / ERNIE | 74 | 32,768 |
| #148 | Grok 3 | xAI | 74 | 131,072 |
| #149 | Grok 3 Mini | xAI | 74 | 131,072 |
| #150 | Grok 4 | xAI | 74 | 256,000 |
| #151 | Grok 4 Fast Reasoning | xAI | 74 | 131,072 |
| #152 | grok-image | xAI | 74 | 131,072 |
| #153 | image-01 | MiniMax | 74 | 8,192 |
| #154 | image-01-live | MiniMax | 74 | 8,192 |
| #155 | Jamba 3B | AI21 Labs | 74 | 256,000 |
| #156 | Jamba Large | AI21 Labs | 74 | 256,000 |
| #157 | Jamba Large 1.6 | AI21 Labs | 74 | 256,000 |
| #158 | Jamba Mini | AI21 Labs | 74 | 256,000 |
| #159 | Jamba Mini 1.6 | AI21 Labs | 74 | 256,000 |
| #160 | Jamba Mini 1.7 | AI21 Labs | 74 | 256,000 |
| #161 | Magistral Small 1.2 Open | Mistral AI | 74 | 128,000 |
| #162 | MiniMax-Speech-02 | MiniMax | 74 | 8,192 |
| #163 | Ministral 3 14B Open | Mistral AI | 74 | 128,000 |
| #164 | Ministral 3 3B Open | Mistral AI | 74 | 128,000 |
| #165 | Ministral 3 8B Open | Mistral AI | 74 | 128,000 |
| #166 | Mistral Large 3 Open | Mistral AI | 74 | 128,000 |
| #167 | Mistral Nemo 12B | Mistral AI | 74 | 128,000 |
| #168 | Mistral OCR 2505 | Mistral AI | 74 | 32,768 |
| #169 | music-2.0 | MiniMax | 74 | 8,192 |
| #170 | Phi-3-vision-128k-instruct | Microsoft | 74 | 128,000 |
| #171 | Phi-3.5-mini-instruct | Microsoft | 74 | 131,072 |
| #172 | Phi-3.5-MoE-instruct | Microsoft | 74 | 131,072 |
| #173 | Phi-3.5-vision-instruct | Microsoft | 74 | 131,072 |
| #174 | Phi-4-mini-flash-reasoning | Microsoft | 74 | 131,072 |
| #175 | Phi-4-mini-instruct | Microsoft | 74 | 131,072 |
| #176 | Phi-4-multimodal-instruct | Microsoft | 74 | 131,072 |
| #177 | Phi-4-reasoning | Microsoft | 74 | 131,072 |
| #178 | Phi-4-reasoning-plus | Microsoft | 74 | 131,072 |
| #179 | Phi-4-reasoning-vision-15B | Microsoft | 74 | 131,072 |
| #180 | Qwen2.5-1.5B-Instruct | Alibaba Qwen | 74 | 131,072 |
| #181 | Qwen2.5-14B-Instruct | Alibaba Qwen | 74 | 131,072 |
| #182 | Qwen2.5-32B-Instruct | Alibaba Qwen | 74 | 131,072 |
| #183 | Qwen2.5-3B-Instruct | Alibaba Qwen | 74 | 131,072 |
| #184 | Qwen2.5-72B-Instruct | Alibaba Qwen | 74 | 131,072 |
| #185 | Qwen2.5-7B-Instruct | Alibaba Qwen | 74 | 131,072 |
| #186 | Qwen2.5-Max | Alibaba Qwen | 74 | 131,072 |
| #187 | Qwen2.5-VL-72B-Instruct | Alibaba Qwen | 74 | 131,072 |
| #188 | Qwen2.5-VL-7B-Instruct | Alibaba Qwen | 74 | 131,072 |
| #189 | Qwen3-Coder-Next | Alibaba Qwen | 74 | 131,072 |
| #190 | Qwen3.5-0.8B | Alibaba Qwen | 74 | 131,072 |
| #191 | Qwen3.5-122B-A10B | Alibaba Qwen | 74 | 131,072 |
| #192 | Qwen3.5-27B | Alibaba Qwen | 74 | 131,072 |
| #193 | Qwen3.5-2B | Alibaba Qwen | 74 | 131,072 |
| #194 | Qwen3.5-35B-A3B | Alibaba Qwen | 74 | 131,072 |
| #195 | Qwen3.5-397B-A17B | Alibaba Qwen | 74 | 131,072 |
| #196 | Qwen3.5-4B | Alibaba Qwen | 74 | 131,072 |
| #197 | Qwen3.5-9B | Alibaba Qwen | 74 | 131,072 |
| #198 | DeepSeek-OCR | DeepSeek | 73 | 16,384 |
| #199 | DeepSeek-OCR-2 | DeepSeek | 73 | 16,384 |
| #200 | DeepSeek-VL2-Small | DeepSeek | 73 | 16,384 |
| #201 | GPT-4o mini Transcribe | OpenAI | 73 | 128,000 |
| #202 | GPT-4o mini TTS | OpenAI | 73 | 128,000 |
| #203 | GPT-4o Transcribe | OpenAI | 73 | 128,000 |
| #204 | gpt-oss-120b | OpenAI | 73 | 131,072 |
| #205 | Janus-Pro-7B | DeepSeek | 73 | 16,384 |
| #206 | Phi-4 | Microsoft | 73 | 16,384 |
| #207 | Claude Opus 3 | Anthropic | 72 | 200,000 |
| #208 | Claude Opus 4 | Anthropic | 72 | 200,000 |
| #209 | Llama 3.3 70B Instruct | Meta | 72 | 128,000 |
| #210 | phi-1 | Microsoft | 72 | 4,096 |
| #211 | phi-1_5 | Microsoft | 72 | 4,096 |
| #212 | phi-2 | Microsoft | 72 | 4,096 |
| #213 | Phi-3-medium-4k-instruct | Microsoft | 72 | 4,096 |
| #214 | Phi-3-mini-4k-instruct | Microsoft | 72 | 4,096 |
| #215 | Phi-tiny-MoE-instruct | Microsoft | 72 | 4,096 |
| #216 | pplx-embed-v1-4b | Perplexity | 72 | 8,192 |
| #217 | Prompt Guard 86M | Meta | 72 | 512 |
| #218 | pplx-embed-v1-0.6b | Perplexity | 71 | 8,192 |
| #219 | Step-3.5-Flash | StepFun | 71 | 131,072 |
| #220 | gpt-oss-20b | OpenAI | 70 | 131,072 |
| #221 | Llama 3.1 405B Instruct | Meta | 70 | 128,000 |
| #222 | Llama 4 Scout | Meta | 70 | 10,485,760 |
| #223 | LFM2-24B-A2B | Liquid AI | 69 | 32,768 |
| #224 | Llama 3.1 70B Instruct | Meta | 69 | 128,000 |
| #225 | MiMo-VL-7B | Xiaomi | 68 | 131,072 |
| #226 | Step3-VL-10B | StepFun | 68 | 131,072 |
| #227 | Llama 3.1 8B Instruct | Meta | 67 | 128,000 |
| #228 | Llama 3.2 90B Vision Instruct | Meta | 67 | 128,000 |
| #229 | MiMo-Audio-7B | Xiaomi | 67 | 131,072 |
| #230 | LFM2-8B-A1B | Liquid AI | 66 | 32,768 |
| #231 | Llama 3.2 1B Instruct | Meta | 66 | 128,000 |
| #232 | Llama 3.2 3B Instruct | Meta | 66 | 128,000 |
| #233 | Step-Audio-R1.1 | StepFun | 66 | 131,072 |
| #234 | Llama 3.2 11B Vision Instruct | Meta | 65 | 128,000 |
| #235 | Code Llama 70B Instruct | Meta | 64 | 8,192 |
| #236 | Meta Llama 3 70B Instruct | Meta | 64 | 8,192 |
| #237 | FLUX 1.1 Pro | Black Forest Labs | 63 | 512 |
| #238 | LFM2.5-1.2B-Instruct | Liquid AI | 63 | 131,072 |
| #239 | LFM2.5-1.2B-Thinking | Liquid AI | 63 | 131,072 |
| #240 | Code Llama 34B Instruct | Meta | 62 | 8,192 |
| #241 | FLUX 1.1 Pro Ultra | Black Forest Labs | 62 | 512 |
| #242 | Meta Llama 3 8B Instruct | Meta | 62 | 8,192 |
| #243 | FLUX 1 Pro | Black Forest Labs | 61 | 512 |
| #244 | LFM2-2.6B | Liquid AI | 60 | 32,768 |
| #245 | NextStep-1.1 | StepFun | 54 | 512 |
Why #1: Claude Opus 4.6
Anthropic's most intelligent Claude model for complex agents, coding, and deep reasoning, with 1M token context and 128K output.
This model clears the current full-profile threshold for leaderboard methodology.
Why #2: Claude Sonnet 4.5
Anthropic's Sonnet 4.5 with 1M token context for fast frontier reasoning, coding, and long-context agent work.
This model clears the current full-profile threshold for leaderboard methodology.
Why #3: Claude Sonnet 4.6
Anthropic's current Sonnet tier for fast frontier reasoning, coding, and long-context agent work.
This model clears the current full-profile threshold for leaderboard methodology.