Use case
Cheapest LLM API
Long-read guideFind the best-value models for cost-sensitive products, high-throughput batch processing, and large-scale inference workloads.
Cheapest LLM API
Find the best-value models for cost-sensitive products, high-throughput batch processing, and large-scale inference workloads.
Why this guide works
- Price per token is the starting point, not the full story
- Match model capability to task complexity to avoid overpaying
- Consider total cost including prompt engineering and retries
Shortlist
These models offer the best value per token while maintaining usable quality for production workloads.
Amazon Web Services
Nova Micro
Nova
Amazon's Nova Micro model on Bedrock for ultra-fast, ultra-low-cost text inference.
- Context
- 128,000
- Input
- $0.00/1K tok
- Output
- $0.0001/1K tok
- Action
- Compare-ready
Amazon Web Services
Nova Lite
Nova
Amazon's Nova Lite model on Bedrock for fast, cost-efficient multimodal inference.
- Context
- 300,000
- Input
- $0.0001/1K tok
- Output
- $0.0002/1K tok
- Action
- Compare-ready
Anthropic
Claude Haiku 4.5
Claude 200K
Anthropic's Haiku 4.5 with 200K context, the fastest Claude model with near-frontier intelligence at low cost.
- Context
- 200,000
- Input
- $0.001/1K tok
- Output
- $0.005/1K tok
- Action
- Compare-ready
Google DeepMind
Gemini 2.5 Flash
Gemini
Google's Gemini 2.5 Flash with 1M context for fast, cost-efficient multimodal inference.
- Context
- 1,048,576
- Input
- $0.0002/1K tok
- Output
- $0.0006/1K tok
- Action
- Compare-ready
Decision table
Choose based on your minimum quality threshold and volume requirements.
| Need | Why it fits | Model |
|---|---|---|
| Ultra-low-cost text tasks | Best when you need the absolute cheapest option for simple text processing and classification. | Nova MicroAmazon Web Services |
| Budget multimodal | Best when you need vision and text at the lowest possible cost for high-volume processing. | Nova LiteAmazon Web Services |
| Quality-sensitive budget | Best when you need near-frontier quality at 1/15th the cost of flagship models. | Claude Haiku 4.5Anthropic |
| Balanced cost-quality | Best when you need strong multimodal quality with very low per-token pricing. | Gemini 2.5 FlashGoogle DeepMind |
Evaluation framework
True cost depends on more than the listed price per token.
Calculate cost per task
Multiply input tokens + output tokens by price. Include retries and prompt overhead.
Check quality floor
Cheaper models fail more often. Factor in error rates and human review costs.
Evaluate context efficiency
Models with larger effective context may need fewer API calls, reducing total cost.
Compare batch vs. real-time
Some providers offer batch discounts. Choose based on your latency requirements.
Common scenarios
Cost optimization depends on your specific workload pattern.
High-volume classification
Use the cheapest model that meets your accuracy threshold. Nova Micro or Haiku work well here.
Batch document processing
Use a model with good quality-to-cost ratio. Gemini 2.5 Flash offers excellent value for multimodal.
Real-time chat at scale
Use a fast, low-cost model with sufficient quality. Haiku 4.5 is a strong choice.
Methodology
This guide prioritizes total cost of ownership over raw price per token.
We calculate cost per task, not just price per token.
We factor in error rates, retries, and human review costs.
We test with real workloads to measure effective quality at each price point.
Next step
Find the most cost-effective model for your workload
Compare pricing across providers and calculate true cost per task for your specific use case.