LLM AtlasLLM AtlasSearch models

Releases

Release feed

Newest first

A changelog-style feed of model launches and noteworthy updates.

2026-03-19

Nemotron-Cascade 2

NVIDIA's 32B (30B-A3B MoE) Nemotron-Cascade 2 trained with cascade RL and multi-domain on-policy distillation. 74.8K downloads on HuggingFace.

2026-03-15

warpgrep-v2

Morph's WarpGrep v2 for ultra-fast agentic code search. #1 on SWE-Bench Pro with sub-6s searches. 15.8% cheaper and 22% faster than alternatives.

2026-03-11

Nemotron 3 Super 120B

NVIDIA's flagship 120B/12B-active LatentMoE model with 1M context, trained on 25T tokens. Strong on agentic workflows, reasoning, and long-context tasks. Requires 8x H100-80GB.

2026-03-07

flash-compact

Morph's Flash Compact for near-lossless context compaction at 33,000+ tok/sec. Reduces context 50-70% in under 2 seconds. +0.6% on SWE-Bench Pro.

2026-03-05

GPT-5.4

OpenAI's GPT-5.4, the most capable and efficient frontier model for professional work. First general-purpose model with native computer-use capabilities. Combines industry-leading coding from GPT-5.3-Codex with improved agentic workflows.

2026-03-05

GPT-5.4 Pro

OpenAI's GPT-5.4 Pro delivering maximum performance for complex tasks. Available for ChatGPT Pro and Enterprise plans.

2026-03-03

GPT-5.3 Instant

OpenAI's GPT-5.3 Instant optimized for low-latency everyday usability. Available to all ChatGPT users.

2026-03-03

Gemini 3.1 Flash-Lite

Google's Gemini 3.1 Flash-Lite, their most cost-effective AI model. Released Mar 2026 for high-volume workloads with best-in-class intelligence at scale.

2026-03-01

morph-v3-fast-apply

Morph's Fast Apply model merges AI-generated code edits at 10,500+ tokens/sec with 98% accuracy. 10x faster than alternatives like GPT-4.1 (55 tok/s) or Claude Sonnet (80 tok/s).

2026-02-24

Claude Sonnet 4.6

Anthropic's current Sonnet tier for fast frontier reasoning, coding, and long-context agent work.

2026-02-24

Claude Opus 4.6

Anthropic's most intelligent Claude model for complex agents, coding, and deep reasoning, with 1M token context and 128K output.

2026-02-24

LFM2-24B-A2B

Liquid AI's flagship 24B/2.3B-active MoE hybrid model. 112 tok/s on AMD CPU, 293 tok/s on H100. Fits in 32GB RAM. Trained on 17T tokens with 30 conv + 10 attn layers. Supports 9 languages.

2026-02-19

Gemini 3.1 Pro

Google's Gemini 3.1 Pro, designed for complex tasks where simple answers aren't enough. Released Feb 2026 with enhanced reasoning and multimodal capabilities.

2026-02-19

Gemini 3.1 Flash

Google's Gemini 3.1 Flash for fast, cost-efficient multimodal inference with strong quality.

2026-02-16

NextStep-1.1

StepFun's 15B NextStep-1.1 for text-to-image generation. Supports image editing and modification based on text prompts.

2026-02-14

Step-Audio-R1.1

StepFun's 33B Step-Audio-R1.1 for audio-text-to-text generation and understanding.

2026-02-12

LFM2.5-1.2B-Instruct

Liquid AI's 1.2B LFM2.5-Instruct with 262K downloads on HuggingFace. Extended 128K context for edge reasoning and instruction following.

2026-02-12

LFM2.5-1.2B-Thinking

Liquid AI's 1.2B LFM2.5-Thinking with chain-of-thought reasoning for edge devices. 30K downloads on HuggingFace.

2026-02-05

GPT-5.3-Codex

OpenAI's GPT-5.3-Codex for expanding Codex across the full spectrum of professional work on a computer. Powers the Codex app on macOS and Windows.

2026-02-01

GPT-5.2

OpenAI's latest frontier model with 1M token context for advanced reasoning and tool use.

2026-02-01

GPT-5.2 Pro

OpenAI's premium GPT-5.2 Pro with maximum reasoning capability.

2026-02-01

Step-3.5-Flash

StepFun's 199B parameter Step-3.5-Flash for text generation and reasoning. 91.9K downloads on HuggingFace. Available in BF16, FP8, and GGUF quantizations.

2026-01-19

Step3-VL-10B

StepFun's 10B Step3-VL vision-language model with 211K downloads on HuggingFace. Supports FP8 quantization.

2026-01-15

Command R+ 2026

An enterprise retrieval specialist built for document-heavy support and knowledge applications.

2025-12-16

MiMo-V2-Flash

Xiaomi's MiMo-V2-Flash: 309B total/15B active MoE with hybrid sliding window attention, Multi-Token Prediction, and 256K context. Scores 94.1 on AIME 2025, 73.4 on SWE-Bench. Trained on 27T tokens with 6x KV-cache reduction.

2025-12-15

Nemotron 3 Nano 4B

NVIDIA's compact 4B Nemotron Nano for efficient local AI with hybrid Mamba-2 architecture. Runs on consumer GPUs.

2025-12-01

Mistral Large 25

An enterprise-ready all-rounder with especially strong European language performance.

2025-12-01

Gemini 3.0 Pro

Google's Gemini 3.0 Pro, quietly released in late 2025. Strong multimodal reasoning with 1M context. Foundation for the Gemini 3.1 series.

2025-12-01

Gemini 3.0 Flash

Google's Gemini 3.0 Flash for fast multimodal inference. Released alongside Gemini 3.0 Pro.

2025-12-01

Mistral Large 3

Mistral's state-of-the-art open-weight multimodal model with 128K context for general-purpose deployment.

2025-12-01

Mistral Large 3 Open

Mistral's state-of-the-art open-weight model with 128K context for self-hosting and customization.

2025-12-01

Ministral 3 14B Open

Mistral's efficient open-weight models with text and vision capabilities at 3B, 8B, and 14B sizes.

2025-12-01

Ministral 3 8B Open

Mistral's efficient open-weight models with text and vision capabilities at 3B, 8B, and 14B sizes.

2025-12-01

Ministral 3 3B Open

Mistral's efficient open-weight models with text and vision capabilities at 3B, 8B, and 14B sizes.

2025-11-28

LFM2-8B-A1B

Liquid AI's 8.3B/1.5B-active MoE hybrid model with 12T training tokens. 24 layers (18 conv + 6 attn). 47.9K downloads on HuggingFace.

2025-11-28

LFM2-2.6B

Liquid AI's 3B parameter LFM2 for lightweight text generation and tool use on edge devices.

2025-10-21

Llama-Embed-Nemotron 8B

NVIDIA's Llama-Embed-Nemotron 8B ranked #1 on multilingual MTEB leaderboard with text and image retrieval support.

2025-10-01

Claude Haiku 4.5

Anthropic's Haiku 4.5 with 200K context, the fastest Claude model with near-frontier intelligence at low cost.

2025-10-01

Claude Haiku 4.5

Anthropic's Haiku 4.5 with 200K context, the fastest Claude model with near-frontier intelligence at low cost.

2025-09-29

Claude Sonnet 4.5

Anthropic's Sonnet 4.5 with 1M token context for fast frontier reasoning, coding, and long-context agent work.

2025-09-23

MiMo-Audio-7B

Xiaomi's MiMo-Audio 7B for any-to-any audio and text generation. Includes base and instruct variants.

2025-09-01

GPT-5

OpenAI's GPT-5 with 1M context for balanced reasoning and coding.

2025-09-01

GPT-5 mini

OpenAI's GPT-5 mini for cost-efficient applications.

2025-09-01

GPT-5 nano

OpenAI's smallest GPT-5 variant for ultra-fast, cost-sensitive applications.

2025-09-01

gpt-realtime-mini

OpenAI's mini audio models for cost-efficient voice interfaces.

2025-09-01

gpt-audio-mini

OpenAI's mini audio models for cost-efficient voice interfaces.

2025-09-01

Magistral Medium 1.2

Mistral's frontier-class multimodal reasoning model with extended thinking.

2025-08-24

FLUX 1 Kontext Dev

Black Forest Labs' FLUX 1 Kontext Dev for context-aware image editing with 95.6K downloads on HuggingFace.

2025-08-21

MiMo-VL-7B

Xiaomi's MiMo-VL 7B vision-language model with 136K downloads on HuggingFace for image understanding and multimodal reasoning.

2025-08-05

Claude Opus 4.1

Anthropic's Claude Opus 4.1 for maximum intelligence on complex reasoning, coding, and agentic tasks.

2025-08-01

Mistral Medium 3.1

Mistral's frontier-class multimodal model released August 2025.

2025-08-01

Codestral

Mistral's cutting-edge code completion model with 256K context fill-in-the-middle support.

2025-07-01

Devstral Medium 1.0

Mistral's enterprise-grade model excelling at software engineering tasks.

2025-07-01

Voxtral Mini Open

Mistral's open-weight mini audio input model for on-prem transcription and audio understanding.

2025-07-01

Voxtral Small Open

Mistral's open-weight small audio input model for instruct use cases with audio understanding.

2025-07-01

Voxtral Mini Transcribe

Mistral's audio transcription model optimized for converting speech to text with high accuracy.

2025-06-10

Llama Nemotron Nano 4B

NVIDIA's compact 4B Llama Nemotron Nano for edge AI with high-accuracy reasoning. Runs on consumer GPUs.

2025-06-04

Llama Nemotron Super 49B

NVIDIA's Llama-based Nemotron Super 49B for high-accuracy reasoning, agentic tasks, and RAG workflows.

2025-06-01

o3-deep-research

OpenAI's deep research model for extended multi-step research tasks with tool use.

2025-06-01

o4-mini-deep-research

OpenAI's cost-efficient deep research model for extended tasks.

2025-06-01

gpt-realtime

OpenAI's realtime and audio models for low-latency voice interfaces.

2025-06-01

gpt-audio

OpenAI's realtime and audio models for low-latency voice interfaces.

2025-06-01

chatgpt-image-latest

OpenAI's lightweight image generation variants.

2025-06-01

gpt-image-1-mini

OpenAI's lightweight image generation variants.

2025-06-01

gpt-oss-120b

OpenAI's 120B open-weight model for frontier-style reasoning with self-hosted deployment.

2025-06-01

gpt-oss-20b

OpenAI's 20B open-weight model for efficient frontier reasoning on consumer hardware.

2025-06-01

Gemini 2.5 Flash-Lite

Google's lightweight Gemini 2.5 Flash variants for cost-efficient multimodal and real-time audio.

2025-06-01

Gemini 2.5 Flash Native Audio Preview

Google's lightweight Gemini 2.5 Flash variants for cost-efficient multimodal and real-time audio.

2025-06-01

Gemini 2.5 Flash Live

Google's lightweight Gemini 2.5 Flash variants for cost-efficient multimodal and real-time audio.

2025-06-01

Mistral Small 3.1

Mistral's efficient small models with 128K context, unifying instruct, reasoning, and coding.

2025-06-01

Mistral Small 3.2 Open

Mistral's efficient small models with 128K context, unifying instruct, reasoning, and coding.

2025-06-01

Magistral Small 1.2 Open

Mistral's open-weight reasoning and coding models for self-hosted deployment.

2025-06-01

Devstral 2 Open

Mistral's open-weight reasoning and coding models for self-hosted deployment.

2025-06-01

Devstral Small 2

Mistral's open-weight reasoning and coding models for self-hosted deployment.

2025-06-01

Mistral Nemo 12B

Mistral's open-weight reasoning and coding models for self-hosted deployment.

2025-06-01

Command R7B

Cohere's Command text family for enterprise chat, retrieval, translation, and agent workflows built around grounding.

2025-06-01

Command A Translate

Cohere's Command text family for enterprise chat, retrieval, translation, and agent workflows built around grounding.

2025-06-01

Command A Reasoning

Cohere's Command text family for enterprise chat, retrieval, translation, and agent workflows built around grounding.

2025-06-01

Command R+

Cohere's Command text family for enterprise chat, retrieval, translation, and agent workflows built around grounding.

2025-06-01

Grok 4

xAI's latest Grok model with 256K context for advanced reasoning and multimodal tasks.

2025-06-01

MiniMax-Speech-02

MiniMax's TTS model for high-fidelity speech synthesis with voice cloning capabilities.

2025-06-01

MiniMax-M2.5

MiniMax's latest M2.5 text model for coding agents, multimodal assistants, and high-speed inference.

2025-06-01

MiniMax-M2.5-highspeed

MiniMax's latest M2.5 text model for coding agents, multimodal assistants, and high-speed inference.

2025-06-01

Kimi K2

Moonshot AI's Kimi K2 with 128K native context (extensible to 131K) for research, code generation, and reasoning-heavy chat.

2025-06-01

Kimi K2.5

Moonshot AI's Kimi family variants with extended context and thinking modes for complex reasoning tasks.

2025-06-01

Kimi K2 Turbo Preview

Moonshot AI's Kimi family variants with extended context and thinking modes for complex reasoning tasks.

2025-06-01

Kimi K2 Thinking

Moonshot AI's Kimi family variants with extended context and thinking modes for complex reasoning tasks.

2025-05-14

Claude Sonnet 4

Anthropic's Sonnet 4 with 1M token context for balanced reasoning and coding performance.

2025-05-14

Claude Opus 4

Anthropic's Opus 4 with 200K context for complex reasoning and agentic tasks.

2025-05-14

Claude Sonnet 4

Anthropic's Sonnet 4 with 1M token context for balanced reasoning and coding performance.

2025-05-14

Claude Opus 4

Anthropic's Opus 4 with 200K context for complex reasoning and agentic tasks.

2025-05-14

Qwen3.5-397B-A17B

Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.

2025-05-14

Qwen3.5-122B-A10B

Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.

2025-05-14

Qwen3.5-35B-A3B

Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.

2025-05-14

Qwen3.5-27B

Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.

2025-05-14

Qwen3.5-9B

Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.

2025-05-14

Qwen3.5-4B

Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.

2025-05-14

Qwen3.5-2B

Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.

2025-05-14

Qwen3.5-0.8B

Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.

2025-05-14

Qwen3-Coder-Next

Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.

2025-05-01

Codestral Embed

Mistral's embedding models for semantic search and code representation extraction.

2025-05-01

Mistral Embed

Mistral's embedding models for semantic search and code representation extraction.

2025-04-30

Phi-4-reasoning-vision-15B

Microsoft's Phi-4 reasoning vision model (15B) combining visual understanding with chain-of-thought reasoning.

2025-04-30

Phi-4-mini-instruct

Microsoft's Phi-4 reasoning variants with 128K context for compact, efficient reasoning on constrained infrastructure.

2025-04-30

Phi-4-reasoning

Microsoft's Phi-4 reasoning variants with 128K context for compact, efficient reasoning on constrained infrastructure.

2025-04-30

Phi-4-reasoning-plus

Microsoft's Phi-4 reasoning variants with 128K context for compact, efficient reasoning on constrained infrastructure.

2025-04-30

Phi-4-mini-flash-reasoning

Microsoft's Phi-4 reasoning variants with 128K context for compact, efficient reasoning on constrained infrastructure.

2025-04-16

o3

OpenAI's o3 reasoning model for complex multi-step problem solving with extended thinking.

2025-04-16

o4-mini

OpenAI's o4-mini for cost-efficient reasoning with strong coding and math capabilities.

2025-04-14

GPT-4.1

OpenAI's GPT-4.1 with 1M token context, improved coding and instruction following.

2025-04-14

GPT-4.1 mini

OpenAI's GPT-4.1 mini for fast, cost-efficient inference with 1M context.

2025-04-05

Llama 4 Maverick

Meta's 17Bx128E MoE open-weight model with 1M token context, pretrained on ~22T tokens. Strong multimodal and multilingual capabilities for teams that need control, private deployment, and customization.

2025-04-05

Llama 4 Scout

Meta's Llama 4 Scout (17Bx16E MoE, 109B total params) with an extraordinary 10M token context window.

2025-04-05

Llama Guard 4 12B

Meta's Llama Guard 4 (12B) safety model for filtering unsafe prompts.

2025-04-01

Grok 4 Fast Reasoning

xAI's Grok API family for fast-moving reasoning, conversational agents, and multimodal assistant workloads.

2025-04-01

Grok 3

xAI's Grok API family for fast-moving reasoning, conversational agents, and multimodal assistant workloads.

2025-04-01

Grok 3 Mini

xAI's Grok API family for fast-moving reasoning, conversational agents, and multimodal assistant workloads.

2025-04-01

grok-image

xAI's Grok API family for fast-moving reasoning, conversational agents, and multimodal assistant workloads.

2025-04-01

music-2.0

MiniMax's music generation model for AI-composed audio tracks.

2025-03-26

GPT Image 1

OpenAI's natively multimodal image generation model.

2025-03-25

Gemini 2.5 Pro

An expansive-context model built for complex research, multimodal understanding, and large document tasks, with 1M token native context.

2025-03-25

Gemini 2.5 Pro TTS

Google's Gemini 2.5 Pro with 1M context for long-context multimodal analysis and text-to-speech.

2025-03-25

Gemini 2.5 Flash

Google's Gemini 2.5 Flash with 1M context for fast, cost-efficient multimodal inference.

2025-03-13

Command A

Cohere's latest Command model with 256K context for enterprise chat, retrieval, and agent workflows.

2025-03-12

FLUX 2 Klein 9B KV

Black Forest Labs' FLUX.2 Klein 9B KV variant with key-value cache optimization for even faster inference.

2025-03-01

Mistral OCR 2505

Mistral's OCR model for document extraction, PDF parsing, and structured output from images.

2025-03-01

Command A Vision

Cohere's vision-capable Command and embedding models for multimodal enterprise assistants and retrieval systems.

2025-03-01

Embed 4

Cohere's vision-capable Command and embedding models for multimodal enterprise assistants and retrieval systems.

2025-03-01

DeepSeek-OCR

DeepSeek's specialized OCR models for document parsing, table extraction, and image-to-text conversion.

2025-03-01

DeepSeek-OCR-2

DeepSeek's specialized OCR models for document parsing, table extraction, and image-to-text conversion.

2025-03-01

image-01

MiniMax's image generation models for text-to-image creation, including a live animation variant.

2025-03-01

image-01-live

MiniMax's image generation models for text-to-image creation, including a live animation variant.

2025-03-01

GLM-5

Z.AI's GLM reasoning family for general chat, coding, and tool-enabled Chinese-first production workloads.

2025-03-01

GLM-4.7

Z.AI's GLM reasoning family for general chat, coding, and tool-enabled Chinese-first production workloads.

2025-03-01

GLM-4.6

Z.AI's GLM reasoning family for general chat, coding, and tool-enabled Chinese-first production workloads.

2025-03-01

GLM-4.5

Z.AI's GLM reasoning family for general chat, coding, and tool-enabled Chinese-first production workloads.

2025-03-01

GLM-4.6V

Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.

2025-03-01

GLM-4.5V

Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.

2025-03-01

GLM-OCR

Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.

2025-03-01

GLM-Image

Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.

2025-03-01

CogView 4

Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.

2025-03-01

CogVideoX

Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.

2025-03-01

Vidu Q1

Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.

2025-03-01

Doubao-Seed-1.6

Bytedance's Doubao coding and reasoning line for domestic developer workflows, visual coding, and agentic assistance.

2025-03-01

Doubao-Seed-1.6-Flash

Bytedance's Doubao coding and reasoning line for domestic developer workflows, visual coding, and agentic assistance.

2025-03-01

Doubao-Seed-Code

Bytedance's Doubao coding and reasoning line for domestic developer workflows, visual coding, and agentic assistance.

2025-03-01

Doubao-Seed-2.0-Code

Bytedance's Doubao coding and reasoning line for domestic developer workflows, visual coding, and agentic assistance.

2025-03-01

Hunyuan T1

Tencent's Hunyuan T1 flagship with 256K context for fast reasoning and long-document understanding.

2025-03-01

Hunyuan TurboS

Tencent's Hunyuan family spans fast-reasoning, long-context, coding, and vision-capable API deployments.

2025-03-01

Hunyuan TurboS LongText 128K

Tencent's Hunyuan family spans fast-reasoning, long-context, coding, and vision-capable API deployments.

2025-03-01

Hunyuan Standard

Tencent's Hunyuan family spans fast-reasoning, long-context, coding, and vision-capable API deployments.

2025-03-01

Hunyuan Lite

Tencent's Hunyuan family spans fast-reasoning, long-context, coding, and vision-capable API deployments.

2025-03-01

Hunyuan Code

Tencent's Hunyuan family spans fast-reasoning, long-context, coding, and vision-capable API deployments.

2025-03-01

Hunyuan T1 Vision

Tencent's Hunyuan family spans fast-reasoning, long-context, coding, and vision-capable API deployments.

2025-03-01

ERNIE 4.5 Turbo 32K

Baidu's ERNIE 4.5 Turbo with 32K context for mainstream enterprise chat and function-calling workloads.

2025-03-01

ERNIE 4.0 Turbo 8K

Baidu's ERNIE family covers enterprise chat, long-context, speed-optimized, and function-calling workloads.

2025-03-01

ERNIE 3.5 128K

Baidu's ERNIE family covers enterprise chat, long-context, speed-optimized, and function-calling workloads.

2025-03-01

ERNIE Speed 128K

Baidu's ERNIE family covers enterprise chat, long-context, speed-optimized, and function-calling workloads.

2025-03-01

ERNIE Functions 8K

Baidu's ERNIE family covers enterprise chat, long-context, speed-optimized, and function-calling workloads.

2025-02-25

Phi-4-multimodal-instruct

Microsoft's 5.6B Phi-4 multimodal model with vision, audio, and text input for lightweight assistant features.

2025-02-24

Claude 3.7 Sonnet

A top-tier reasoning model with strong software engineering assistance and enterprise controls.

2025-02-24

FLUX 2 Klein 4B

Black Forest Labs' 4B FLUX.2 Klein with Apache 2.0 license for fast, commercial-use image generation. 268K monthly downloads.

2025-02-05

Gemini 2.0 Flash

Google's Gemini 2.0 Flash with 1M context for fast multimodal inference.

2025-02-05

Gemini 2.0 Flash-Lite

Google's Gemini 2.0 Flash-Lite for ultra-fast, cost-efficient inference.

2025-02-04

FLUX 2 Dev

Black Forest Labs' 32B parameter FLUX.2 Dev, the state-of-the-art open text-to-image model with single and multi-reference editing. 987K monthly downloads on HuggingFace.

2025-02-01

MiniMax-VL-01

MiniMax's vision-language model with 200K context for multimodal understanding and image analysis.

2025-01-27

Janus-Pro-7B

DeepSeek's multimodal understanding and generation model (7B) for image-to-text and text-to-image tasks.

2025-01-26

Qwen2.5-VL-72B-Instruct

Alibaba's flagship 72B vision-language model with 131K context for document reading, image understanding, and multimodal reasoning.

2025-01-26

Qwen2.5-VL-7B-Instruct

Alibaba's efficient 7B vision-language model for lightweight multimodal assistants and document understanding.

2025-01-26

Phi-3-mini-4k-instruct

Microsoft's earlier Phi models with shorter context windows for edge and local deployment.

2025-01-26

Phi-3-medium-4k-instruct

Microsoft's earlier Phi models with shorter context windows for edge and local deployment.

2025-01-26

phi-2

Microsoft's earlier Phi models with shorter context windows for edge and local deployment.

2025-01-26

phi-1_5

Microsoft's earlier Phi models with shorter context windows for edge and local deployment.

2025-01-26

phi-1

Microsoft's earlier Phi models with shorter context windows for edge and local deployment.

2025-01-26

Phi-tiny-MoE-instruct

Microsoft's earlier Phi models with shorter context windows for edge and local deployment.

2025-01-21

Sonar Pro

Perplexity's Sonar Pro for enhanced web search with multi-step reasoning, Pro Search, and citation support. $3/$15 per MTok plus request fee.

2025-01-21

Sonar

Perplexity's fast, cost-efficient Sonar model for basic web search and Q&A. $1/$1 per MTok plus request fee.

2025-01-21

Sonar Reasoning Pro

Perplexity's Sonar Reasoning Pro for complex reasoning with real-time web search. $2/$8 per MTok plus request fee.

2025-01-21

Sonar Deep Research

Perplexity's Sonar Deep Research for automated multi-step research with citation generation. $2/$8 MTok + $2 citation + $3 reasoning + $5 per 1K queries.

2025-01-21

pplx-embed-v1-4b

Perplexity's 4B embedding model with 2560 dimensions for high-quality semantic search and RAG. $0.03 per MTok.

2025-01-21

pplx-embed-v1-0.6b

Perplexity's lightweight 0.6B embedding model with 1024 dimensions for fast, cost-efficient embeddings. $0.004 per MTok.

2025-01-20

DeepSeek-R1

DeepSeek's reasoning model (671B total, 37B activated) trained with large-scale RL, achieving o1-level performance on math, code, and reasoning tasks.

2025-01-20

DeepSeek-R1-Distill-Llama-70B

DeepSeek-R1 distilled into Llama 3.3 70B, achieving strong reasoning with dense model efficiency.

2025-01-20

DeepSeek-Coder-V2

DeepSeek's specialist models for code generation and mathematical reasoning.

2025-01-20

DeepSeek-Math-V2

DeepSeek's specialist models for code generation and mathematical reasoning.

2025-01-16

FLUX 2 Klein 9B

Black Forest Labs' 9B FLUX.2 Klein for sub-second image generation with 4 inference steps. Matches models 5x its size in quality. Requires ~29GB VRAM (RTX 4090+).

2025-01-01

MiniMax-Text-01

MiniMax's text generation models with 200K context for general-purpose language tasks.

2025-01-01

MiniMax-M1

MiniMax's text generation models with 200K context for general-purpose language tasks.

2025-01-01

MiniMax-M2.1

MiniMax's earlier M2.x models for general-purpose multimodal inference.

2025-01-01

MiniMax-M2.1-highspeed

MiniMax's earlier M2.x models for general-purpose multimodal inference.

2025-01-01

MiniMax-M2

MiniMax's earlier M2.x models for general-purpose multimodal inference.

2025-01-01

FLUX 1 Krea Dev

Black Forest Labs' FLUX 1 Krea Dev developed in partnership with Krea AI for enhanced creative image generation.

2024-12-26

DeepSeek-V3.2

DeepSeek's MoE flagship models (671B total, 37B activated params) with 128K context, strong on math, code, and reasoning benchmarks.

2024-12-26

DeepSeek-V3.2-Exp

DeepSeek's MoE flagship models (671B total, 37B activated params) with 128K context, strong on math, code, and reasoning benchmarks.

2024-12-26

DeepSeek-V3.1

DeepSeek's MoE flagship models (671B total, 37B activated params) with 128K context, strong on math, code, and reasoning benchmarks.

2024-12-26

DeepSeek-V3.1-Base

DeepSeek's MoE flagship models (671B total, 37B activated params) with 128K context, strong on math, code, and reasoning benchmarks.

2024-12-26

DeepSeek-V3

DeepSeek's MoE flagship models (671B total, 37B activated params) with 128K context, strong on math, code, and reasoning benchmarks.

2024-12-26

DeepSeek-V2.5

DeepSeek's MoE flagship models (671B total, 37B activated params) with 128K context, strong on math, code, and reasoning benchmarks.

2024-12-18

Granite 3.1 8B Instruct

IBM's Granite 3.1 8B model for enterprise text and code tasks with Apache 2.0 license.

2024-12-18

Granite 3.1 2B Instruct

IBM's Granite 3.1 2B model for lightweight enterprise deployment.

2024-12-17

o1

OpenAI's o1 reasoning model with advanced chain-of-thought for complex problem solving.

2024-12-12

DeepSeek-VL2-Small

DeepSeek's lightweight vision-language model for efficient multimodal understanding.

2024-12-12

Phi-4

Microsoft's 14B parameter Phi-4, a state-of-the-art small model trained on 9.8T tokens with strong reasoning on MMLU (84.8) and GPQA (56.1).

2024-12-12

Jamba Large 1.6

AI21's flagship Jamba model with 256K context for long-document enterprise workflows.

2024-12-12

Jamba Mini 1.6

AI21's efficient Jamba Mini models with 256K context for cost-sensitive long-context deployments.

2024-12-12

Jamba Mini 1.7

AI21's efficient Jamba Mini models with 256K context for cost-sensitive long-context deployments.

2024-12-12

Jamba Large

AI21's earlier Jamba models for long-context enterprise assistants with open deployment options.

2024-12-12

Jamba Mini

AI21's earlier Jamba models for long-context enterprise assistants with open deployment options.

2024-12-12

Jamba 3B

AI21's earlier Jamba models for long-context enterprise assistants with open deployment options.

2024-12-06

Llama 3.3 70B Instruct

Meta's latest Llama 3.x dense model, 70B parameters with 128K context.

2024-12-03

Nova Pro

Amazon's Nova Pro model on Bedrock with 300K context for enterprise text and vision tasks.

2024-12-03

Nova Lite

Amazon's Nova Lite model on Bedrock for fast, cost-efficient multimodal inference.

2024-12-03

Nova Micro

Amazon's Nova Micro model on Bedrock for ultra-fast, ultra-low-cost text inference.

2024-11-06

FLUX 1.1 Pro Ultra

Black Forest Labs' FLUX 1.1 Pro Ultra for ultra-high-resolution 4MP image generation via API.

2024-11-01

Pixtral Large

Mistral's first frontier-class multimodal model with 128K context for image understanding and document analysis.

2024-11-01

Mistral Moderation

Mistral's moderation model for detecting harmful text content and policy violations.

2024-11-01

FLUX 1 Fill Dev

Black Forest Labs' FLUX 1 Fill Dev for inpainting and outpainting with open weights.

2024-11-01

FLUX 1 Depth Dev

Black Forest Labs' FLUX 1 Depth Dev for depth-aware image generation and structural editing.

2024-11-01

FLUX 1 Canny Dev

Black Forest Labs' FLUX 1 Canny Dev for edge-aware image generation using Canny edge detection.

2024-11-01

FLUX 1 Redux Dev

Black Forest Labs' FLUX 1 Redux Dev for image variation and style transfer with open weights.

2024-10-29

Stable Diffusion 3.5 Medium

Stability AI's efficient medium-sized image generation model.

2024-10-22

Claude Haiku 3.5

Anthropic's Haiku 3.5 with 200K context for fast, cost-efficient inference.

2024-10-22

Claude Haiku 3.5

Anthropic's Haiku 3.5 with 200K context for fast, cost-efficient inference.

2024-10-22

Stable Diffusion 3.5 Large

Stability AI's flagship image generation model with high-quality text-to-image capabilities.

2024-10-21

Granite 3.0 8B Instruct

IBM's Granite 3.0 8B model for enterprise text and code generation.

2024-10-02

FLUX 1.1 Pro

Black Forest Labs' flagship FLUX 1.1 Pro for high-quality text-to-image generation via API.

2024-10-01

GPT-4o mini TTS

OpenAI's GPT-4o based TTS and transcription models.

2024-10-01

GPT-4o Transcribe

OpenAI's GPT-4o based TTS and transcription models.

2024-10-01

GPT-4o mini Transcribe

OpenAI's GPT-4o based TTS and transcription models.

2024-09-25

Llama 3.2 3B Instruct

Meta's lightweight Llama 3.2 models for edge and mobile deployment with 128K context.

2024-09-25

Llama 3.2 1B Instruct

Meta's lightweight Llama 3.2 models for edge and mobile deployment with 128K context.

2024-09-25

Llama 3.2 90B Vision Instruct

Meta's Llama 3.2 90B vision model with 128K context for image understanding.

2024-09-25

Llama 3.2 11B Vision Instruct

Meta's Llama 3.2 11B vision model for efficient multimodal understanding.

2024-09-25

Llama Guard 3 11B Vision

Meta's Llama Guard 3 vision safety model for multimodal content moderation.

2024-09-24

Gemini 1.5 Pro

Google's Gemini 1.5 Pro with 2M context for long document and media analysis.

2024-09-24

Gemini 1.5 Flash

Google's Gemini 1.5 Flash with 1M context for fast multimodal inference.

2024-09-24

Gemini 1.5 Flash-8B

Google's lightweight Gemini 1.5 Flash-8B for ultra-fast, cost-efficient inference.

2024-09-19

Qwen2.5-Max

Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.

2024-09-19

Qwen2.5-72B-Instruct

Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.

2024-09-19

Qwen2.5-32B-Instruct

Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.

2024-09-19

Qwen2.5-14B-Instruct

Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.

2024-09-19

Qwen2.5-7B-Instruct

Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.

2024-09-19

Qwen2.5-3B-Instruct

Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.

2024-09-19

Qwen2.5-1.5B-Instruct

Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.

2024-09-17

NV-Embed v2

NVIDIA's state-of-the-art text embedding model ranked #1 on MTEB leaderboard for retrieval and semantic similarity tasks.

2024-09-12

o1-mini

OpenAI's o1-mini for fast, cost-efficient reasoning on coding and STEM tasks.

2024-09-01

Pixtral 12B

Mistral's 12B open-weight vision model for private document analysis and on-prem image understanding.

2024-08-20

Phi-3.5-vision-instruct

Microsoft's Phi-3.5 vision model with 128K context for image understanding and multimodal chat.

2024-08-20

Phi-3.5-mini-instruct

Microsoft's Phi-3.5 models with 128K context, including a MoE variant for improved efficiency.

2024-08-20

Phi-3.5-MoE-instruct

Microsoft's Phi-3.5 models with 128K context, including a MoE variant for improved efficiency.

2024-08-01

FLUX 1 Pro

Black Forest Labs' original FLUX 1 Pro for high-quality text-to-image via API.

2024-08-01

FLUX 1 Dev

Black Forest Labs' 12B parameter FLUX 1 Dev for open-weight text-to-image generation with guidance distillation.

2024-08-01

FLUX 1 Schnell

Black Forest Labs' 12B FLUX 1 Schnell for ultra-fast image generation in 1-4 steps with Apache 2.0 license.

2024-07-23

Llama 3.1 405B Instruct

Meta's Llama 3.1 405B flagship with 128K context.

2024-07-23

Llama 3.1 70B Instruct

Meta's Llama 3.1 70B with 128K context for instruction-following.

2024-07-23

Llama 3.1 8B Instruct

Meta's Llama 3.1 8B with 128K context for efficient deployment.

2024-07-23

Prompt Guard 86M

Meta's lightweight 86M Prompt Guard for detecting jailbreak and prompt injection attempts.

2024-07-18

GPT-4o-mini

OpenAI's GPT-4o-mini for fast, cost-efficient inference with strong general capability.

2024-05-21

Phi-3-vision-128k-instruct

Microsoft's Phi-3 vision model with 128K context for lightweight image understanding on edge devices.

2024-05-13

GPT-4o

A broadly capable multimodal model optimized for production chat, agentic workflows, and voice experiences.

2024-04-18

Meta Llama 3 70B Instruct

Meta's original Llama 3 and Code Llama 70B models with 8K context.

2024-04-18

Code Llama 70B Instruct

Meta's original Llama 3 and Code Llama 70B models with 8K context.

2024-04-18

Meta Llama 3 8B Instruct

Meta's smaller Llama 3 and Code Llama models with 8K context.

2024-04-18

Code Llama 34B Instruct

Meta's smaller Llama 3 and Code Llama models with 8K context.

2024-03-07

Claude Haiku 3

Anthropic's deprecated Haiku 3 model, retiring April 2026.

2024-03-07

Claude Haiku 3

Anthropic's deprecated Haiku 3 model, retiring April 2026.

2024-03-04

Claude Sonnet 3

Anthropic's legacy Sonnet 3 model with 200K context.

2024-03-04

Claude Sonnet 3

Anthropic's legacy Sonnet 3 model with 200K context.

2024-02-29

Claude Opus 3

Anthropic's legacy Opus 3 model with 200K context.

2024-02-29

Claude Opus 3

Anthropic's legacy Opus 3 model with 200K context.