Releases
Release feed
Newest firstA changelog-style feed of model launches and noteworthy updates.
2026-03-19
Nemotron-Cascade 2
NVIDIA's 32B (30B-A3B MoE) Nemotron-Cascade 2 trained with cascade RL and multi-domain on-policy distillation. 74.8K downloads on HuggingFace.
2026-03-15
warpgrep-v2
Morph's WarpGrep v2 for ultra-fast agentic code search. #1 on SWE-Bench Pro with sub-6s searches. 15.8% cheaper and 22% faster than alternatives.
2026-03-11
Nemotron 3 Super 120B
NVIDIA's flagship 120B/12B-active LatentMoE model with 1M context, trained on 25T tokens. Strong on agentic workflows, reasoning, and long-context tasks. Requires 8x H100-80GB.
2026-03-07
flash-compact
Morph's Flash Compact for near-lossless context compaction at 33,000+ tok/sec. Reduces context 50-70% in under 2 seconds. +0.6% on SWE-Bench Pro.
2026-03-05
GPT-5.4
OpenAI's GPT-5.4, the most capable and efficient frontier model for professional work. First general-purpose model with native computer-use capabilities. Combines industry-leading coding from GPT-5.3-Codex with improved agentic workflows.
2026-03-05
GPT-5.4 Pro
OpenAI's GPT-5.4 Pro delivering maximum performance for complex tasks. Available for ChatGPT Pro and Enterprise plans.
2026-03-03
GPT-5.3 Instant
OpenAI's GPT-5.3 Instant optimized for low-latency everyday usability. Available to all ChatGPT users.
2026-03-03
Gemini 3.1 Flash-Lite
Google's Gemini 3.1 Flash-Lite, their most cost-effective AI model. Released Mar 2026 for high-volume workloads with best-in-class intelligence at scale.
2026-03-01
morph-v3-fast-apply
Morph's Fast Apply model merges AI-generated code edits at 10,500+ tokens/sec with 98% accuracy. 10x faster than alternatives like GPT-4.1 (55 tok/s) or Claude Sonnet (80 tok/s).
2026-02-24
Claude Sonnet 4.6
Anthropic's current Sonnet tier for fast frontier reasoning, coding, and long-context agent work.
2026-02-24
Claude Opus 4.6
Anthropic's most intelligent Claude model for complex agents, coding, and deep reasoning, with 1M token context and 128K output.
2026-02-24
LFM2-24B-A2B
Liquid AI's flagship 24B/2.3B-active MoE hybrid model. 112 tok/s on AMD CPU, 293 tok/s on H100. Fits in 32GB RAM. Trained on 17T tokens with 30 conv + 10 attn layers. Supports 9 languages.
2026-02-19
Gemini 3.1 Pro
Google's Gemini 3.1 Pro, designed for complex tasks where simple answers aren't enough. Released Feb 2026 with enhanced reasoning and multimodal capabilities.
2026-02-19
Gemini 3.1 Flash
Google's Gemini 3.1 Flash for fast, cost-efficient multimodal inference with strong quality.
2026-02-16
NextStep-1.1
StepFun's 15B NextStep-1.1 for text-to-image generation. Supports image editing and modification based on text prompts.
2026-02-14
Step-Audio-R1.1
StepFun's 33B Step-Audio-R1.1 for audio-text-to-text generation and understanding.
2026-02-12
LFM2.5-1.2B-Instruct
Liquid AI's 1.2B LFM2.5-Instruct with 262K downloads on HuggingFace. Extended 128K context for edge reasoning and instruction following.
2026-02-12
LFM2.5-1.2B-Thinking
Liquid AI's 1.2B LFM2.5-Thinking with chain-of-thought reasoning for edge devices. 30K downloads on HuggingFace.
2026-02-05
GPT-5.3-Codex
OpenAI's GPT-5.3-Codex for expanding Codex across the full spectrum of professional work on a computer. Powers the Codex app on macOS and Windows.
2026-02-01
GPT-5.2
OpenAI's latest frontier model with 1M token context for advanced reasoning and tool use.
2026-02-01
GPT-5.2 Pro
OpenAI's premium GPT-5.2 Pro with maximum reasoning capability.
2026-02-01
Step-3.5-Flash
StepFun's 199B parameter Step-3.5-Flash for text generation and reasoning. 91.9K downloads on HuggingFace. Available in BF16, FP8, and GGUF quantizations.
2026-01-19
Step3-VL-10B
StepFun's 10B Step3-VL vision-language model with 211K downloads on HuggingFace. Supports FP8 quantization.
2026-01-15
Command R+ 2026
An enterprise retrieval specialist built for document-heavy support and knowledge applications.
2025-12-16
MiMo-V2-Flash
Xiaomi's MiMo-V2-Flash: 309B total/15B active MoE with hybrid sliding window attention, Multi-Token Prediction, and 256K context. Scores 94.1 on AIME 2025, 73.4 on SWE-Bench. Trained on 27T tokens with 6x KV-cache reduction.
2025-12-15
Nemotron 3 Nano 4B
NVIDIA's compact 4B Nemotron Nano for efficient local AI with hybrid Mamba-2 architecture. Runs on consumer GPUs.
2025-12-01
Mistral Large 25
An enterprise-ready all-rounder with especially strong European language performance.
2025-12-01
Gemini 3.0 Pro
Google's Gemini 3.0 Pro, quietly released in late 2025. Strong multimodal reasoning with 1M context. Foundation for the Gemini 3.1 series.
2025-12-01
Gemini 3.0 Flash
Google's Gemini 3.0 Flash for fast multimodal inference. Released alongside Gemini 3.0 Pro.
2025-12-01
Mistral Large 3
Mistral's state-of-the-art open-weight multimodal model with 128K context for general-purpose deployment.
2025-12-01
Mistral Large 3 Open
Mistral's state-of-the-art open-weight model with 128K context for self-hosting and customization.
2025-12-01
Ministral 3 14B Open
Mistral's efficient open-weight models with text and vision capabilities at 3B, 8B, and 14B sizes.
2025-12-01
Ministral 3 8B Open
Mistral's efficient open-weight models with text and vision capabilities at 3B, 8B, and 14B sizes.
2025-12-01
Ministral 3 3B Open
Mistral's efficient open-weight models with text and vision capabilities at 3B, 8B, and 14B sizes.
2025-11-28
LFM2-8B-A1B
Liquid AI's 8.3B/1.5B-active MoE hybrid model with 12T training tokens. 24 layers (18 conv + 6 attn). 47.9K downloads on HuggingFace.
2025-11-28
LFM2-2.6B
Liquid AI's 3B parameter LFM2 for lightweight text generation and tool use on edge devices.
2025-10-21
Llama-Embed-Nemotron 8B
NVIDIA's Llama-Embed-Nemotron 8B ranked #1 on multilingual MTEB leaderboard with text and image retrieval support.
2025-10-01
Claude Haiku 4.5
Anthropic's Haiku 4.5 with 200K context, the fastest Claude model with near-frontier intelligence at low cost.
2025-10-01
Claude Haiku 4.5
Anthropic's Haiku 4.5 with 200K context, the fastest Claude model with near-frontier intelligence at low cost.
2025-09-29
Claude Sonnet 4.5
Anthropic's Sonnet 4.5 with 1M token context for fast frontier reasoning, coding, and long-context agent work.
2025-09-23
MiMo-Audio-7B
Xiaomi's MiMo-Audio 7B for any-to-any audio and text generation. Includes base and instruct variants.
2025-09-01
GPT-5
OpenAI's GPT-5 with 1M context for balanced reasoning and coding.
2025-09-01
GPT-5 mini
OpenAI's GPT-5 mini for cost-efficient applications.
2025-09-01
GPT-5 nano
OpenAI's smallest GPT-5 variant for ultra-fast, cost-sensitive applications.
2025-09-01
gpt-realtime-mini
OpenAI's mini audio models for cost-efficient voice interfaces.
2025-09-01
gpt-audio-mini
OpenAI's mini audio models for cost-efficient voice interfaces.
2025-09-01
Magistral Medium 1.2
Mistral's frontier-class multimodal reasoning model with extended thinking.
2025-08-24
FLUX 1 Kontext Dev
Black Forest Labs' FLUX 1 Kontext Dev for context-aware image editing with 95.6K downloads on HuggingFace.
2025-08-21
MiMo-VL-7B
Xiaomi's MiMo-VL 7B vision-language model with 136K downloads on HuggingFace for image understanding and multimodal reasoning.
2025-08-05
Claude Opus 4.1
Anthropic's Claude Opus 4.1 for maximum intelligence on complex reasoning, coding, and agentic tasks.
2025-08-01
Mistral Medium 3.1
Mistral's frontier-class multimodal model released August 2025.
2025-08-01
Codestral
Mistral's cutting-edge code completion model with 256K context fill-in-the-middle support.
2025-07-01
Devstral Medium 1.0
Mistral's enterprise-grade model excelling at software engineering tasks.
2025-07-01
Voxtral Mini Open
Mistral's open-weight mini audio input model for on-prem transcription and audio understanding.
2025-07-01
Voxtral Small Open
Mistral's open-weight small audio input model for instruct use cases with audio understanding.
2025-07-01
Voxtral Mini Transcribe
Mistral's audio transcription model optimized for converting speech to text with high accuracy.
2025-06-10
Llama Nemotron Nano 4B
NVIDIA's compact 4B Llama Nemotron Nano for edge AI with high-accuracy reasoning. Runs on consumer GPUs.
2025-06-04
Llama Nemotron Super 49B
NVIDIA's Llama-based Nemotron Super 49B for high-accuracy reasoning, agentic tasks, and RAG workflows.
2025-06-01
o3-deep-research
OpenAI's deep research model for extended multi-step research tasks with tool use.
2025-06-01
o4-mini-deep-research
OpenAI's cost-efficient deep research model for extended tasks.
2025-06-01
gpt-realtime
OpenAI's realtime and audio models for low-latency voice interfaces.
2025-06-01
gpt-audio
OpenAI's realtime and audio models for low-latency voice interfaces.
2025-06-01
chatgpt-image-latest
OpenAI's lightweight image generation variants.
2025-06-01
gpt-image-1-mini
OpenAI's lightweight image generation variants.
2025-06-01
gpt-oss-120b
OpenAI's 120B open-weight model for frontier-style reasoning with self-hosted deployment.
2025-06-01
gpt-oss-20b
OpenAI's 20B open-weight model for efficient frontier reasoning on consumer hardware.
2025-06-01
Gemini 2.5 Flash-Lite
Google's lightweight Gemini 2.5 Flash variants for cost-efficient multimodal and real-time audio.
2025-06-01
Gemini 2.5 Flash Native Audio Preview
Google's lightweight Gemini 2.5 Flash variants for cost-efficient multimodal and real-time audio.
2025-06-01
Gemini 2.5 Flash Live
Google's lightweight Gemini 2.5 Flash variants for cost-efficient multimodal and real-time audio.
2025-06-01
Mistral Small 3.1
Mistral's efficient small models with 128K context, unifying instruct, reasoning, and coding.
2025-06-01
Mistral Small 3.2 Open
Mistral's efficient small models with 128K context, unifying instruct, reasoning, and coding.
2025-06-01
Magistral Small 1.2 Open
Mistral's open-weight reasoning and coding models for self-hosted deployment.
2025-06-01
Devstral 2 Open
Mistral's open-weight reasoning and coding models for self-hosted deployment.
2025-06-01
Devstral Small 2
Mistral's open-weight reasoning and coding models for self-hosted deployment.
2025-06-01
Mistral Nemo 12B
Mistral's open-weight reasoning and coding models for self-hosted deployment.
2025-06-01
Command R7B
Cohere's Command text family for enterprise chat, retrieval, translation, and agent workflows built around grounding.
2025-06-01
Command A Translate
Cohere's Command text family for enterprise chat, retrieval, translation, and agent workflows built around grounding.
2025-06-01
Command A Reasoning
Cohere's Command text family for enterprise chat, retrieval, translation, and agent workflows built around grounding.
2025-06-01
Command R+
Cohere's Command text family for enterprise chat, retrieval, translation, and agent workflows built around grounding.
2025-06-01
Grok 4
xAI's latest Grok model with 256K context for advanced reasoning and multimodal tasks.
2025-06-01
MiniMax-Speech-02
MiniMax's TTS model for high-fidelity speech synthesis with voice cloning capabilities.
2025-06-01
MiniMax-M2.5
MiniMax's latest M2.5 text model for coding agents, multimodal assistants, and high-speed inference.
2025-06-01
MiniMax-M2.5-highspeed
MiniMax's latest M2.5 text model for coding agents, multimodal assistants, and high-speed inference.
2025-06-01
Kimi K2
Moonshot AI's Kimi K2 with 128K native context (extensible to 131K) for research, code generation, and reasoning-heavy chat.
2025-06-01
Kimi K2.5
Moonshot AI's Kimi family variants with extended context and thinking modes for complex reasoning tasks.
2025-06-01
Kimi K2 Turbo Preview
Moonshot AI's Kimi family variants with extended context and thinking modes for complex reasoning tasks.
2025-06-01
Kimi K2 Thinking
Moonshot AI's Kimi family variants with extended context and thinking modes for complex reasoning tasks.
2025-05-14
Claude Sonnet 4
Anthropic's Sonnet 4 with 1M token context for balanced reasoning and coding performance.
2025-05-14
Claude Opus 4
Anthropic's Opus 4 with 200K context for complex reasoning and agentic tasks.
2025-05-14
Claude Sonnet 4
Anthropic's Sonnet 4 with 1M token context for balanced reasoning and coding performance.
2025-05-14
Claude Opus 4
Anthropic's Opus 4 with 200K context for complex reasoning and agentic tasks.
2025-05-14
Qwen3.5-397B-A17B
Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.
2025-05-14
Qwen3.5-122B-A10B
Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.
2025-05-14
Qwen3.5-35B-A3B
Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.
2025-05-14
Qwen3.5-27B
Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.
2025-05-14
Qwen3.5-9B
Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.
2025-05-14
Qwen3.5-4B
Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.
2025-05-14
Qwen3.5-2B
Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.
2025-05-14
Qwen3.5-0.8B
Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.
2025-05-14
Qwen3-Coder-Next
Alibaba's Qwen3 family with thinking/non-thinking mode switching, 100+ language support, and 131K context via YaRN.
2025-05-01
Codestral Embed
Mistral's embedding models for semantic search and code representation extraction.
2025-05-01
Mistral Embed
Mistral's embedding models for semantic search and code representation extraction.
2025-04-30
Phi-4-reasoning-vision-15B
Microsoft's Phi-4 reasoning vision model (15B) combining visual understanding with chain-of-thought reasoning.
2025-04-30
Phi-4-mini-instruct
Microsoft's Phi-4 reasoning variants with 128K context for compact, efficient reasoning on constrained infrastructure.
2025-04-30
Phi-4-reasoning
Microsoft's Phi-4 reasoning variants with 128K context for compact, efficient reasoning on constrained infrastructure.
2025-04-30
Phi-4-reasoning-plus
Microsoft's Phi-4 reasoning variants with 128K context for compact, efficient reasoning on constrained infrastructure.
2025-04-30
Phi-4-mini-flash-reasoning
Microsoft's Phi-4 reasoning variants with 128K context for compact, efficient reasoning on constrained infrastructure.
2025-04-16
o3
OpenAI's o3 reasoning model for complex multi-step problem solving with extended thinking.
2025-04-16
o4-mini
OpenAI's o4-mini for cost-efficient reasoning with strong coding and math capabilities.
2025-04-14
GPT-4.1
OpenAI's GPT-4.1 with 1M token context, improved coding and instruction following.
2025-04-14
GPT-4.1 mini
OpenAI's GPT-4.1 mini for fast, cost-efficient inference with 1M context.
2025-04-05
Llama 4 Maverick
Meta's 17Bx128E MoE open-weight model with 1M token context, pretrained on ~22T tokens. Strong multimodal and multilingual capabilities for teams that need control, private deployment, and customization.
2025-04-05
Llama 4 Scout
Meta's Llama 4 Scout (17Bx16E MoE, 109B total params) with an extraordinary 10M token context window.
2025-04-05
Llama Guard 4 12B
Meta's Llama Guard 4 (12B) safety model for filtering unsafe prompts.
2025-04-01
Grok 4 Fast Reasoning
xAI's Grok API family for fast-moving reasoning, conversational agents, and multimodal assistant workloads.
2025-04-01
Grok 3
xAI's Grok API family for fast-moving reasoning, conversational agents, and multimodal assistant workloads.
2025-04-01
Grok 3 Mini
xAI's Grok API family for fast-moving reasoning, conversational agents, and multimodal assistant workloads.
2025-04-01
grok-image
xAI's Grok API family for fast-moving reasoning, conversational agents, and multimodal assistant workloads.
2025-04-01
music-2.0
MiniMax's music generation model for AI-composed audio tracks.
2025-03-26
GPT Image 1
OpenAI's natively multimodal image generation model.
2025-03-25
Gemini 2.5 Pro
An expansive-context model built for complex research, multimodal understanding, and large document tasks, with 1M token native context.
2025-03-25
Gemini 2.5 Pro TTS
Google's Gemini 2.5 Pro with 1M context for long-context multimodal analysis and text-to-speech.
2025-03-25
Gemini 2.5 Flash
Google's Gemini 2.5 Flash with 1M context for fast, cost-efficient multimodal inference.
2025-03-13
Command A
Cohere's latest Command model with 256K context for enterprise chat, retrieval, and agent workflows.
2025-03-12
FLUX 2 Klein 9B KV
Black Forest Labs' FLUX.2 Klein 9B KV variant with key-value cache optimization for even faster inference.
2025-03-01
Mistral OCR 2505
Mistral's OCR model for document extraction, PDF parsing, and structured output from images.
2025-03-01
Command A Vision
Cohere's vision-capable Command and embedding models for multimodal enterprise assistants and retrieval systems.
2025-03-01
Embed 4
Cohere's vision-capable Command and embedding models for multimodal enterprise assistants and retrieval systems.
2025-03-01
DeepSeek-OCR
DeepSeek's specialized OCR models for document parsing, table extraction, and image-to-text conversion.
2025-03-01
DeepSeek-OCR-2
DeepSeek's specialized OCR models for document parsing, table extraction, and image-to-text conversion.
2025-03-01
image-01
MiniMax's image generation models for text-to-image creation, including a live animation variant.
2025-03-01
image-01-live
MiniMax's image generation models for text-to-image creation, including a live animation variant.
2025-03-01
GLM-5
Z.AI's GLM reasoning family for general chat, coding, and tool-enabled Chinese-first production workloads.
2025-03-01
GLM-4.7
Z.AI's GLM reasoning family for general chat, coding, and tool-enabled Chinese-first production workloads.
2025-03-01
GLM-4.6
Z.AI's GLM reasoning family for general chat, coding, and tool-enabled Chinese-first production workloads.
2025-03-01
GLM-4.5
Z.AI's GLM reasoning family for general chat, coding, and tool-enabled Chinese-first production workloads.
2025-03-01
GLM-4.6V
Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.
2025-03-01
GLM-4.5V
Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.
2025-03-01
GLM-OCR
Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.
2025-03-01
GLM-Image
Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.
2025-03-01
CogView 4
Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.
2025-03-01
CogVideoX
Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.
2025-03-01
Vidu Q1
Z.AI's multimodal model line for document OCR, image generation, video generation, and visual reasoning tasks.
2025-03-01
Doubao-Seed-1.6
Bytedance's Doubao coding and reasoning line for domestic developer workflows, visual coding, and agentic assistance.
2025-03-01
Doubao-Seed-1.6-Flash
Bytedance's Doubao coding and reasoning line for domestic developer workflows, visual coding, and agentic assistance.
2025-03-01
Doubao-Seed-Code
Bytedance's Doubao coding and reasoning line for domestic developer workflows, visual coding, and agentic assistance.
2025-03-01
Doubao-Seed-2.0-Code
Bytedance's Doubao coding and reasoning line for domestic developer workflows, visual coding, and agentic assistance.
2025-03-01
Hunyuan T1
Tencent's Hunyuan T1 flagship with 256K context for fast reasoning and long-document understanding.
2025-03-01
Hunyuan TurboS
Tencent's Hunyuan family spans fast-reasoning, long-context, coding, and vision-capable API deployments.
2025-03-01
Hunyuan TurboS LongText 128K
Tencent's Hunyuan family spans fast-reasoning, long-context, coding, and vision-capable API deployments.
2025-03-01
Hunyuan Standard
Tencent's Hunyuan family spans fast-reasoning, long-context, coding, and vision-capable API deployments.
2025-03-01
Hunyuan Lite
Tencent's Hunyuan family spans fast-reasoning, long-context, coding, and vision-capable API deployments.
2025-03-01
Hunyuan Code
Tencent's Hunyuan family spans fast-reasoning, long-context, coding, and vision-capable API deployments.
2025-03-01
Hunyuan T1 Vision
Tencent's Hunyuan family spans fast-reasoning, long-context, coding, and vision-capable API deployments.
2025-03-01
ERNIE 4.5 Turbo 32K
Baidu's ERNIE 4.5 Turbo with 32K context for mainstream enterprise chat and function-calling workloads.
2025-03-01
ERNIE 4.0 Turbo 8K
Baidu's ERNIE family covers enterprise chat, long-context, speed-optimized, and function-calling workloads.
2025-03-01
ERNIE 3.5 128K
Baidu's ERNIE family covers enterprise chat, long-context, speed-optimized, and function-calling workloads.
2025-03-01
ERNIE Speed 128K
Baidu's ERNIE family covers enterprise chat, long-context, speed-optimized, and function-calling workloads.
2025-03-01
ERNIE Functions 8K
Baidu's ERNIE family covers enterprise chat, long-context, speed-optimized, and function-calling workloads.
2025-02-25
Phi-4-multimodal-instruct
Microsoft's 5.6B Phi-4 multimodal model with vision, audio, and text input for lightweight assistant features.
2025-02-24
Claude 3.7 Sonnet
A top-tier reasoning model with strong software engineering assistance and enterprise controls.
2025-02-24
FLUX 2 Klein 4B
Black Forest Labs' 4B FLUX.2 Klein with Apache 2.0 license for fast, commercial-use image generation. 268K monthly downloads.
2025-02-05
Gemini 2.0 Flash
Google's Gemini 2.0 Flash with 1M context for fast multimodal inference.
2025-02-05
Gemini 2.0 Flash-Lite
Google's Gemini 2.0 Flash-Lite for ultra-fast, cost-efficient inference.
2025-02-04
FLUX 2 Dev
Black Forest Labs' 32B parameter FLUX.2 Dev, the state-of-the-art open text-to-image model with single and multi-reference editing. 987K monthly downloads on HuggingFace.
2025-02-01
MiniMax-VL-01
MiniMax's vision-language model with 200K context for multimodal understanding and image analysis.
2025-01-27
Janus-Pro-7B
DeepSeek's multimodal understanding and generation model (7B) for image-to-text and text-to-image tasks.
2025-01-26
Qwen2.5-VL-72B-Instruct
Alibaba's flagship 72B vision-language model with 131K context for document reading, image understanding, and multimodal reasoning.
2025-01-26
Qwen2.5-VL-7B-Instruct
Alibaba's efficient 7B vision-language model for lightweight multimodal assistants and document understanding.
2025-01-26
Phi-3-mini-4k-instruct
Microsoft's earlier Phi models with shorter context windows for edge and local deployment.
2025-01-26
Phi-3-medium-4k-instruct
Microsoft's earlier Phi models with shorter context windows for edge and local deployment.
2025-01-26
phi-2
Microsoft's earlier Phi models with shorter context windows for edge and local deployment.
2025-01-26
phi-1_5
Microsoft's earlier Phi models with shorter context windows for edge and local deployment.
2025-01-26
phi-1
Microsoft's earlier Phi models with shorter context windows for edge and local deployment.
2025-01-26
Phi-tiny-MoE-instruct
Microsoft's earlier Phi models with shorter context windows for edge and local deployment.
2025-01-21
Sonar Pro
Perplexity's Sonar Pro for enhanced web search with multi-step reasoning, Pro Search, and citation support. $3/$15 per MTok plus request fee.
2025-01-21
Sonar
Perplexity's fast, cost-efficient Sonar model for basic web search and Q&A. $1/$1 per MTok plus request fee.
2025-01-21
Sonar Reasoning Pro
Perplexity's Sonar Reasoning Pro for complex reasoning with real-time web search. $2/$8 per MTok plus request fee.
2025-01-21
Sonar Deep Research
Perplexity's Sonar Deep Research for automated multi-step research with citation generation. $2/$8 MTok + $2 citation + $3 reasoning + $5 per 1K queries.
2025-01-21
pplx-embed-v1-4b
Perplexity's 4B embedding model with 2560 dimensions for high-quality semantic search and RAG. $0.03 per MTok.
2025-01-21
pplx-embed-v1-0.6b
Perplexity's lightweight 0.6B embedding model with 1024 dimensions for fast, cost-efficient embeddings. $0.004 per MTok.
2025-01-20
DeepSeek-R1
DeepSeek's reasoning model (671B total, 37B activated) trained with large-scale RL, achieving o1-level performance on math, code, and reasoning tasks.
2025-01-20
DeepSeek-R1-Distill-Llama-70B
DeepSeek-R1 distilled into Llama 3.3 70B, achieving strong reasoning with dense model efficiency.
2025-01-20
DeepSeek-Coder-V2
DeepSeek's specialist models for code generation and mathematical reasoning.
2025-01-20
DeepSeek-Math-V2
DeepSeek's specialist models for code generation and mathematical reasoning.
2025-01-16
FLUX 2 Klein 9B
Black Forest Labs' 9B FLUX.2 Klein for sub-second image generation with 4 inference steps. Matches models 5x its size in quality. Requires ~29GB VRAM (RTX 4090+).
2025-01-01
MiniMax-Text-01
MiniMax's text generation models with 200K context for general-purpose language tasks.
2025-01-01
MiniMax-M1
MiniMax's text generation models with 200K context for general-purpose language tasks.
2025-01-01
MiniMax-M2.1
MiniMax's earlier M2.x models for general-purpose multimodal inference.
2025-01-01
MiniMax-M2.1-highspeed
MiniMax's earlier M2.x models for general-purpose multimodal inference.
2025-01-01
MiniMax-M2
MiniMax's earlier M2.x models for general-purpose multimodal inference.
2025-01-01
FLUX 1 Krea Dev
Black Forest Labs' FLUX 1 Krea Dev developed in partnership with Krea AI for enhanced creative image generation.
2024-12-26
DeepSeek-V3.2
DeepSeek's MoE flagship models (671B total, 37B activated params) with 128K context, strong on math, code, and reasoning benchmarks.
2024-12-26
DeepSeek-V3.2-Exp
DeepSeek's MoE flagship models (671B total, 37B activated params) with 128K context, strong on math, code, and reasoning benchmarks.
2024-12-26
DeepSeek-V3.1
DeepSeek's MoE flagship models (671B total, 37B activated params) with 128K context, strong on math, code, and reasoning benchmarks.
2024-12-26
DeepSeek-V3.1-Base
DeepSeek's MoE flagship models (671B total, 37B activated params) with 128K context, strong on math, code, and reasoning benchmarks.
2024-12-26
DeepSeek-V3
DeepSeek's MoE flagship models (671B total, 37B activated params) with 128K context, strong on math, code, and reasoning benchmarks.
2024-12-26
DeepSeek-V2.5
DeepSeek's MoE flagship models (671B total, 37B activated params) with 128K context, strong on math, code, and reasoning benchmarks.
2024-12-18
Granite 3.1 8B Instruct
IBM's Granite 3.1 8B model for enterprise text and code tasks with Apache 2.0 license.
2024-12-18
Granite 3.1 2B Instruct
IBM's Granite 3.1 2B model for lightweight enterprise deployment.
2024-12-17
o1
OpenAI's o1 reasoning model with advanced chain-of-thought for complex problem solving.
2024-12-12
DeepSeek-VL2-Small
DeepSeek's lightweight vision-language model for efficient multimodal understanding.
2024-12-12
Phi-4
Microsoft's 14B parameter Phi-4, a state-of-the-art small model trained on 9.8T tokens with strong reasoning on MMLU (84.8) and GPQA (56.1).
2024-12-12
Jamba Large 1.6
AI21's flagship Jamba model with 256K context for long-document enterprise workflows.
2024-12-12
Jamba Mini 1.6
AI21's efficient Jamba Mini models with 256K context for cost-sensitive long-context deployments.
2024-12-12
Jamba Mini 1.7
AI21's efficient Jamba Mini models with 256K context for cost-sensitive long-context deployments.
2024-12-12
Jamba Large
AI21's earlier Jamba models for long-context enterprise assistants with open deployment options.
2024-12-12
Jamba Mini
AI21's earlier Jamba models for long-context enterprise assistants with open deployment options.
2024-12-12
Jamba 3B
AI21's earlier Jamba models for long-context enterprise assistants with open deployment options.
2024-12-06
Llama 3.3 70B Instruct
Meta's latest Llama 3.x dense model, 70B parameters with 128K context.
2024-12-03
Nova Pro
Amazon's Nova Pro model on Bedrock with 300K context for enterprise text and vision tasks.
2024-12-03
Nova Lite
Amazon's Nova Lite model on Bedrock for fast, cost-efficient multimodal inference.
2024-12-03
Nova Micro
Amazon's Nova Micro model on Bedrock for ultra-fast, ultra-low-cost text inference.
2024-11-06
FLUX 1.1 Pro Ultra
Black Forest Labs' FLUX 1.1 Pro Ultra for ultra-high-resolution 4MP image generation via API.
2024-11-01
Pixtral Large
Mistral's first frontier-class multimodal model with 128K context for image understanding and document analysis.
2024-11-01
Mistral Moderation
Mistral's moderation model for detecting harmful text content and policy violations.
2024-11-01
FLUX 1 Fill Dev
Black Forest Labs' FLUX 1 Fill Dev for inpainting and outpainting with open weights.
2024-11-01
FLUX 1 Depth Dev
Black Forest Labs' FLUX 1 Depth Dev for depth-aware image generation and structural editing.
2024-11-01
FLUX 1 Canny Dev
Black Forest Labs' FLUX 1 Canny Dev for edge-aware image generation using Canny edge detection.
2024-11-01
FLUX 1 Redux Dev
Black Forest Labs' FLUX 1 Redux Dev for image variation and style transfer with open weights.
2024-10-29
Stable Diffusion 3.5 Medium
Stability AI's efficient medium-sized image generation model.
2024-10-22
Claude Haiku 3.5
Anthropic's Haiku 3.5 with 200K context for fast, cost-efficient inference.
2024-10-22
Claude Haiku 3.5
Anthropic's Haiku 3.5 with 200K context for fast, cost-efficient inference.
2024-10-22
Stable Diffusion 3.5 Large
Stability AI's flagship image generation model with high-quality text-to-image capabilities.
2024-10-21
Granite 3.0 8B Instruct
IBM's Granite 3.0 8B model for enterprise text and code generation.
2024-10-02
FLUX 1.1 Pro
Black Forest Labs' flagship FLUX 1.1 Pro for high-quality text-to-image generation via API.
2024-10-01
GPT-4o mini TTS
OpenAI's GPT-4o based TTS and transcription models.
2024-10-01
GPT-4o Transcribe
OpenAI's GPT-4o based TTS and transcription models.
2024-10-01
GPT-4o mini Transcribe
OpenAI's GPT-4o based TTS and transcription models.
2024-09-25
Llama 3.2 3B Instruct
Meta's lightweight Llama 3.2 models for edge and mobile deployment with 128K context.
2024-09-25
Llama 3.2 1B Instruct
Meta's lightweight Llama 3.2 models for edge and mobile deployment with 128K context.
2024-09-25
Llama 3.2 90B Vision Instruct
Meta's Llama 3.2 90B vision model with 128K context for image understanding.
2024-09-25
Llama 3.2 11B Vision Instruct
Meta's Llama 3.2 11B vision model for efficient multimodal understanding.
2024-09-25
Llama Guard 3 11B Vision
Meta's Llama Guard 3 vision safety model for multimodal content moderation.
2024-09-24
Gemini 1.5 Pro
Google's Gemini 1.5 Pro with 2M context for long document and media analysis.
2024-09-24
Gemini 1.5 Flash
Google's Gemini 1.5 Flash with 1M context for fast multimodal inference.
2024-09-24
Gemini 1.5 Flash-8B
Google's lightweight Gemini 1.5 Flash-8B for ultra-fast, cost-efficient inference.
2024-09-19
Qwen2.5-Max
Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.
2024-09-19
Qwen2.5-72B-Instruct
Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.
2024-09-19
Qwen2.5-32B-Instruct
Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.
2024-09-19
Qwen2.5-14B-Instruct
Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.
2024-09-19
Qwen2.5-7B-Instruct
Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.
2024-09-19
Qwen2.5-3B-Instruct
Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.
2024-09-19
Qwen2.5-1.5B-Instruct
Alibaba's Qwen2.5 generalist models with 131K context for multilingual apps, coding assistants, and self-hosted deployment.
2024-09-17
NV-Embed v2
NVIDIA's state-of-the-art text embedding model ranked #1 on MTEB leaderboard for retrieval and semantic similarity tasks.
2024-09-12
o1-mini
OpenAI's o1-mini for fast, cost-efficient reasoning on coding and STEM tasks.
2024-09-01
Pixtral 12B
Mistral's 12B open-weight vision model for private document analysis and on-prem image understanding.
2024-08-20
Phi-3.5-vision-instruct
Microsoft's Phi-3.5 vision model with 128K context for image understanding and multimodal chat.
2024-08-20
Phi-3.5-mini-instruct
Microsoft's Phi-3.5 models with 128K context, including a MoE variant for improved efficiency.
2024-08-20
Phi-3.5-MoE-instruct
Microsoft's Phi-3.5 models with 128K context, including a MoE variant for improved efficiency.
2024-08-01
FLUX 1 Pro
Black Forest Labs' original FLUX 1 Pro for high-quality text-to-image via API.
2024-08-01
FLUX 1 Dev
Black Forest Labs' 12B parameter FLUX 1 Dev for open-weight text-to-image generation with guidance distillation.
2024-08-01
FLUX 1 Schnell
Black Forest Labs' 12B FLUX 1 Schnell for ultra-fast image generation in 1-4 steps with Apache 2.0 license.
2024-07-23
Llama 3.1 405B Instruct
Meta's Llama 3.1 405B flagship with 128K context.
2024-07-23
Llama 3.1 70B Instruct
Meta's Llama 3.1 70B with 128K context for instruction-following.
2024-07-23
Llama 3.1 8B Instruct
Meta's Llama 3.1 8B with 128K context for efficient deployment.
2024-07-23
Prompt Guard 86M
Meta's lightweight 86M Prompt Guard for detecting jailbreak and prompt injection attempts.
2024-07-18
GPT-4o-mini
OpenAI's GPT-4o-mini for fast, cost-efficient inference with strong general capability.
2024-05-21
Phi-3-vision-128k-instruct
Microsoft's Phi-3 vision model with 128K context for lightweight image understanding on edge devices.
2024-05-13
GPT-4o
A broadly capable multimodal model optimized for production chat, agentic workflows, and voice experiences.
2024-04-18
Meta Llama 3 70B Instruct
Meta's original Llama 3 and Code Llama 70B models with 8K context.
2024-04-18
Code Llama 70B Instruct
Meta's original Llama 3 and Code Llama 70B models with 8K context.
2024-04-18
Meta Llama 3 8B Instruct
Meta's smaller Llama 3 and Code Llama models with 8K context.
2024-04-18
Code Llama 34B Instruct
Meta's smaller Llama 3 and Code Llama models with 8K context.
2024-03-07
Claude Haiku 3
Anthropic's deprecated Haiku 3 model, retiring April 2026.
2024-03-07
Claude Haiku 3
Anthropic's deprecated Haiku 3 model, retiring April 2026.
2024-03-04
Claude Sonnet 3
Anthropic's legacy Sonnet 3 model with 200K context.
2024-03-04
Claude Sonnet 3
Anthropic's legacy Sonnet 3 model with 200K context.
2024-02-29
Claude Opus 3
Anthropic's legacy Opus 3 model with 200K context.
2024-02-29
Claude Opus 3
Anthropic's legacy Opus 3 model with 200K context.