Releases

Release feed

Newest first

A changelog-style feed of model launches and noteworthy updates.

2026-03-19

Nemotron-Cascade 2

NVIDIA's 32B (30B-A3B MoE) Nemotron-Cascade 2 trained with cascade RL and multi-domain on-policy distillation. 74.8K downloads on HuggingFace.

2026-03-15

warpgrep-v2

Morph's WarpGrep v2 for ultra-fast agentic code search. #1 on SWE-Bench Pro with sub-6s searches. 15.8% cheaper and 22% faster than alternatives.

2026-03-11

Nemotron 3 Super 120B

NVIDIA's flagship 120B/12B-active LatentMoE model with 1M context, trained on 25T tokens. Strong on agentic workflows, reasoning, and long-context tasks. Requires 8x H100-80GB.

2026-03-07

flash-compact

Morph's Flash Compact for near-lossless context compaction at 33,000+ tok/sec. Reduces context 50-70% in under 2 seconds. +0.6% on SWE-Bench Pro.

2026-03-05

GPT-5.4

OpenAI's GPT-5.4, the most capable and efficient frontier model for professional work. First general-purpose model with native computer-use capabilities. Combines industry-leading coding from GPT-5.3-Codex with improved agentic workflows.

2026-03-05

GPT-5.4 Pro

OpenAI's GPT-5.4 Pro delivering maximum performance for complex tasks. Available for ChatGPT Pro and Enterprise plans.

2026-03-03

GPT-5.3 Instant

OpenAI's GPT-5.3 Instant optimized for low-latency everyday usability. Available to all ChatGPT users.

2026-03-03

Gemini 3.1 Flash-Lite

Google's Gemini 3.1 Flash-Lite, their most cost-effective AI model. Released Mar 2026 for high-volume workloads with best-in-class intelligence at scale.

2026-03-01

morph-v3-fast-apply

Morph's Fast Apply model merges AI-generated code edits at 10,500+ tokens/sec with 98% accuracy. 10x faster than alternatives like GPT-4.1 (55 tok/s) or Claude Sonnet (80 tok/s).

2026-02-24

Claude Sonnet 4.6

Anthropic's current Sonnet tier for fast frontier reasoning, coding, and long-context agent work.

2026-02-24

Claude Opus 4.6

Anthropic's most intelligent Claude model for complex agents, coding, and deep reasoning, with 1M token context and 128K output.

2026-02-24

LFM2-24B-A2B

Liquid AI's flagship 24B/2.3B-active MoE hybrid model. 112 tok/s on AMD CPU, 293 tok/s on H100. Fits in 32GB RAM. Trained on 17T tokens with 30 conv + 10 attn layers. Supports 9 languages.

2026-02-19

Gemini 3.1 Pro

Google's Gemini 3.1 Pro, designed for complex tasks where simple answers aren't enough. Released Feb 2026 with enhanced reasoning and multimodal capabilities.

2026-02-19

Gemini 3.1 Flash

Google's Gemini 3.1 Flash for fast, cost-efficient multimodal inference with strong quality.

2026-02-16

NextStep-1.1

StepFun's 15B NextStep-1.1 for text-to-image generation. Supports image editing and modification based on text prompts.

2026-02-14

Step-Audio-R1.1

StepFun's 33B Step-Audio-R1.1 for audio-text-to-text generation and understanding.

2026-02-12

LFM2.5-1.2B-Instruct

Liquid AI's 1.2B LFM2.5-Instruct with 262K downloads on HuggingFace. Extended 128K context for edge reasoning and instruction following.

2026-02-12

LFM2.5-1.2B-Thinking

Liquid AI's 1.2B LFM2.5-Thinking with chain-of-thought reasoning for edge devices. 30K downloads on HuggingFace.

2026-02-05

GPT-5.3-Codex

OpenAI's GPT-5.3-Codex for expanding Codex across the full spectrum of professional work on a computer. Powers the Codex app on macOS and Windows.

2026-02-01

GPT-5.2

OpenAI's latest frontier model with 1M token context for advanced reasoning and tool use.

2026-02-01

GPT-5.2 Pro

OpenAI's premium GPT-5.2 Pro with maximum reasoning capability.

2026-02-01

Step-3.5-Flash

StepFun's 199B parameter Step-3.5-Flash for text generation and reasoning. 91.9K downloads on HuggingFace. Available in BF16, FP8, and GGUF quantizations.

2026-01-19

Step3-VL-10B

StepFun's 10B Step3-VL vision-language model with 211K downloads on HuggingFace. Supports FP8 quantization.

2026-01-15

Command R+ 2026

An enterprise retrieval specialist built for document-heavy support and knowledge applications.

2025-12-16

MiMo-V2-Flash

Xiaomi's MiMo-V2-Flash: 309B total/15B active MoE with hybrid sliding window attention, Multi-Token Prediction, and 256K context. Scores 94.1 on AIME 2025, 73.4 on SWE-Bench. Trained on 27T tokens with 6x KV-cache reduction.

2025-12-15

Nemotron 3 Nano 4B

NVIDIA's compact 4B Nemotron Nano for efficient local AI with hybrid Mamba-2 architecture. Runs on consumer GPUs.

2025-12-01

Mistral Large 25

An enterprise-ready all-rounder with especially strong European language performance.

2025-12-01

Gemini 3.0 Pro

Google's Gemini 3.0 Pro, quietly released in late 2025. Strong multimodal reasoning with 1M context. Foundation for the Gemini 3.1 series.

2025-12-01

Gemini 3.0 Flash

Google's Gemini 3.0 Flash for fast multimodal inference. Released alongside Gemini 3.0 Pro.

2025-12-01

Mistral Large 3

Mistral's state-of-the-art open-weight multimodal model with 128K context for general-purpose deployment.

2025-12-01

Mistral Large 3 Open

Mistral's state-of-the-art open-weight model with 128K context for self-hosting and customization.

2025-12-01

Ministral 3 14B Open

Mistral's efficient open-weight models with text and vision capabilities at 3B, 8B, and 14B sizes.

2025-12-01

Ministral 3 8B Open

Mistral's efficient open-weight models with text and vision capabilities at 3B, 8B, and 14B sizes.

2025-12-01

Ministral 3 3B Open

Mistral's efficient open-weight models with text and vision capabilities at 3B, 8B, and 14B sizes.

2025-11-28

LFM2-8B-A1B

Liquid AI's 8.3B/1.5B-active MoE hybrid model with 12T training tokens. 24 layers (18 conv + 6 attn). 47.9K downloads on HuggingFace.

2025-11-28

LFM2-2.6B

Liquid AI's 3B parameter LFM2 for lightweight text generation and tool use on edge devices.

2025-10-21

Llama-Embed-Nemotron 8B

NVIDIA's Llama-Embed-Nemotron 8B ranked #1 on multilingual MTEB leaderboard with text and image retrieval support.

2025-10-01

Claude Haiku 4.5

Anthropic's Haiku 4.5 with 200K context, the fastest Claude model with near-frontier intelligence at low cost.

2025-10-01

Claude Haiku 4.5

Anthropic's Haiku 4.5 with 200K context, the fastest Claude model with near-frontier intelligence at low cost.

2025-09-29

Claude Sonnet 4.5

Anthropic's Sonnet 4.5 with 1M token context for fast frontier reasoning, coding, and long-context agent work.

2025-09-23

MiMo-Audio-7B

Xiaomi's MiMo-Audio 7B for any-to-any audio and text generation. Includes base and instruct variants.

2025-09-01

GPT-5

OpenAI's GPT-5 with 1M context for balanced reasoning and coding.

2025-09-01

GPT-5 mini

OpenAI's GPT-5 mini for cost-efficient applications.

2025-09-01

GPT-5 nano

OpenAI's smallest GPT-5 variant for ultra-fast, cost-sensitive applications.

2025-09-01

gpt-realtime-mini

OpenAI's mini audio models for cost-efficient voice interfaces.

2025-09-01

gpt-audio-mini

OpenAI's mini audio models for cost-efficient voice interfaces.

2025-09-01

Magistral Medium 1.2

Mistral's frontier-class multimodal reasoning model with extended thinking.

2025-08-24

FLUX 1 Kontext Dev

Black Forest Labs' FLUX 1 Kontext Dev for context-aware image editing with 95.6K downloads on HuggingFace.

2025-08-21

MiMo-VL-7B

Xiaomi's MiMo-VL 7B vision-language model with 136K downloads on HuggingFace for image understanding and multimodal reasoning.

2025-08-05

Claude Opus 4.1

Anthropic's Claude Opus 4.1 for maximum intelligence on complex reasoning, coding, and agentic tasks.

2025-08-01

Mistral Medium 3.1

Mistral's frontier-class multimodal model released August 2025.

2025-08-01

Codestral

Mistral's cutting-edge code completion model with 256K context fill-in-the-middle support.

2025-07-01

Devstral Medium 1.0

Mistral's enterprise-grade model excelling at software engineering tasks.

2025-07-01

Voxtral Mini Open

Mistral's open-weight mini audio input model for on-prem transcription and audio understanding.

2025-07-01

Voxtral Small Open

Mistral's open-weight small audio input model for instruct use cases with audio understanding.

2025-07-01

Voxtral Mini Transcribe

Mistral's audio transcription model optimized for converting speech to text with high accuracy.

2025-06-10

Llama Nemotron Nano 4B

NVIDIA's compact 4B Llama Nemotron Nano for edge AI with high-accuracy reasoning. Runs on consumer GPUs.

2025-06-04

Llama Nemotron Super 49B

NVIDIA's Llama-based Nemotron Super 49B for high-accuracy reasoning, agentic tasks, and RAG workflows.

2025-06-01

o3-deep-research

OpenAI's deep research model for extended multi-step research tasks with tool use.

2025-06-01

o4-mini-deep-research

OpenAI's cost-efficient deep research model for extended tasks.

2025-06-01

gpt-realtime

OpenAI's realtime and audio models for low-latency voice interfaces.

2025-06-01

gpt-audio

OpenAI's realtime and audio models for low-latency voice interfaces.

2025-06-01

chatgpt-image-latest

OpenAI's lightweight image generation variants.

2025-06-01

gpt-image-1-mini

OpenAI's lightweight image generation variants.

2025-06-01

gpt-oss-120b

OpenAI's 120B open-weight model for frontier-style reasoning with self-hosted deployment.

2025-06-01

gpt-oss-20b

OpenAI's 20B open-weight model for efficient frontier reasoning on consumer hardware.

2025-06-01

Gemini 2.5 Flash-Lite

Google's lightweight Gemini 2.5 Flash variants for cost-efficient multimodal and real-time audio.

2025-06-01

Gemini 2.5 Flash Native Audio Preview

Google's lightweight Gemini 2.5 Flash variants for cost-efficient multimodal and real-time audio.

2025-06-01

Gemini 2.5 Flash Live

Google's lightweight Gemini 2.5 Flash variants for cost-efficient multimodal and real-time audio.

2025-06-01

Mistral Small 3.1

Mistral's efficient small models with 128K context, unifying instruct, reasoning, and coding.

2025-06-01

Mistral Small 3.2 Open

Mistral's efficient small models with 128K context, unifying instruct, reasoning, and coding.

2025-06-01

Magistral Small 1.2 Open

Mistral's open-weight reasoning and coding models for self-hosted deployment.

2025-06-01

Devstral 2 Open

Mistral's open-weight reasoning and coding models for self-hosted deployment.

2025-06-01

Devstral Small 2

Mistral's open-weight reasoning and coding models for self-hosted deployment.

2025-06-01

Mistral Nemo 12B

Mistral's open-weight reasoning and coding models for self-hosted deployment.

2025-06-01

Command R7B

Cohere's Command text family for enterprise chat, retrieval, translation, and agent workflows built around grounding.

2025-06-01

Command A Translate

Cohere's Command text family for enterprise chat, retrieval, translation, and agent workflows built around grounding.

2025-06-01

Command A Reasoning

Cohere's Command text family for enterprise chat, retrieval, translation, and agent workflows built around grounding.

2025-06-01

Command R+

Cohere's Command text family for enterprise chat, retrieval, translation, and agent workflows built around grounding.

2025-06-01

Grok 4

xAI's latest Grok model with 256K context for advanced reasoning and multimodal tasks.

2025-06-01

MiniMax-Speech-02

MiniMax's TTS model for high-fidelity speech synthesis with voice cloning capabilities.

2025-06-01

MiniMax-M2.5

MiniMax's latest M2.5 text model for coding agents, multimodal assistants, and high-speed inference.

2025-06-01

MiniMax-M2.5-highspeed

MiniMax's latest M2.5 text model for coding agents, multimodal assistants, and high-speed inference.

2025-06-01

Kimi K2

Moonshot AI's Kimi K2 with 128K native context (extensible to 131K) for research, code generation, and reasoning-heavy chat.

2025-06-01

Kimi K2.5

Moonshot AI's Kimi family variants with extended context and thinking modes for complex reasoning tasks.

2025-06-01

Kimi K2 Turbo Preview

Moonshot AI's Kimi family variants with extended context and thinking modes for complex reasoning tasks.

2025-06-01

Kimi K2 Thinking

Moonshot AI's Kimi family variants with extended context and thinking modes for complex reasoning tasks.

2025-05-14

Claude Sonnet 4

Anthropic's Sonnet 4 with 1M token context for balanced reasoning and coding performance.

2025-05-14

Claude Opus 4

Anthropic's Opus 4 with 200K context for complex reasoning and agentic tasks.

2025-05-14

Claude Sonnet 4

Anthropic's Sonnet 4 with 1M token context for balanced reasoning and coding performance.

2025-05-14