StepFun

Step3-VL-10B

Overall 67Open weightopen-weight

StepFun's 10B Step3-VL vision-language model with 211K downloads on HuggingFace. Supports FP8 quantization.

Last verified: 2026-03-29Confidence: HighSources: 3

textvisionopen-source

Input price

Not applicable

Output price

Not applicable

Context window

131,072

Max output

16,384

Release date

2026-01-19

Access

open-weight, self-hosted

License

StepFun Open Model License

Last verified

2026-03-29

Capability profile

Radar view of the model's practical strengths. This chart is backed by textual summaries below for crawlability.

Benchmark summary

Most downloaded StepFun model for vision-language understanding.

No benchmark series is attached to this model yet. Source links and product metadata are available below.

Strengths

• 211K downloads
• Vision-language
• FP8 quantization support

Trade-offs

• 10B parameter limit
• Chinese-first

Crawlable benchmark analysis

Step3-VL-10B is positioned as an open-weight vision model with published scores that emphasize its practical fit for buyers evaluating the entry.

Published scores highlight reasoning 60/100, coding 52/100, enterprise readiness 58/100, vision 72/100, speed 82/100, and safety 66/100.

Pricing is not applicable for this self-hosted or open-weight entry. With a context window of 131,072 tokens, it supports large-document analysis and retrieval workflows.

Benchmark coverage is still limited for this entry, so this section focuses on published metadata and deployment fit.

Sources

Provider and distribution links used to verify this model record.

Last verified: 2026-03-29

Step3-VL-10B repository
official-website
Open link
StepFun website
official-website
Open link
StepFun HuggingFace
cloud-platform
Open link

Related models

OpenAI

GPT-5.4

OpenAI

OpenAI's GPT-5.4, the most capable and efficient frontier model for professional work. First general-purpose model with native computer-use capabilities. Combines industry-leading coding from GPT-5.3-Codex with improved agentic workflows.

Score 933 sources

textreasoningtool-usevisionapihosted

Context: 1,000,000
Input: $0.005/1K tok
Output: $0.02/1K tok
Coverage: Full profile

View analysis

Anthropic

Claude Sonnet 4.6

Claude 4.6

Anthropic's current Sonnet tier for fast frontier reasoning, coding, and long-context agent work.

Score 923 sources

textvisionreasoningcodetool-useapihosted

Context: 1,000,000
Input: $0.003/1K tok
Output: $0.02/1K tok
Coverage: Full profile

View analysis

Anthropic

Claude Opus 4.6

Claude 1M

Anthropic's most intelligent Claude model for complex agents, coding, and deep reasoning, with 1M token context and 128K output.

Score 913 sources

textvisionreasoningapihosted

Context: 1,000,000
Input: $0.005/1K tok
Output: $0.03/1K tok
Coverage: Full profile

View analysis