Inference API

Models & Pricing

This page documents the public model catalog and pricing for the inference API. Use GET /v1/models when you want the current model metadata response.

#Current catalog

The public catalog currently includes 20 models. Pricing below is shown in dollars per 1 million tokens.

ModelModalitiesContextMax outputInput $/1MOutput $/1MCapabilities
deepseekdeepseek-v3.2Text163,84065,536$0.27$0.42
TextJSON modeToolsReasoningStructured outputs
z-aiglm-4.7Text202,75265,535$0.45$2.10
TextJSON modeStructured outputsToolsReasoning
openaigpt-oss-120bText131,07265,536$0.10$0.75
TextJSON modeStructured outputsToolsReasoning
deepseekdeepseek-r1-0528Text163,84065,536$0.80$2.40
TextJSON modeToolsStructured outputsReasoning
deepseekdeepseek-chat-v3-0324Text163,84065,536$0.27$1.12
TextJSON modeStructured outputsTools
qwenqwen3-coder-nextText262,14465,536$0.15$0.80
TextJSON modeStructured outputsTools
qwenqwen3.5-397b-a17bText262,14465,536$0.60$3.60
TextToolsStructured outputsReasoning
deepseekdeepseek-v3.2-specialeText163,84065,536$0.40$1.20
TextJSON modeReasoningStructured outputs
moonshotaikimi-k2-0905Text262,144262,144$0.60$2.50
TextJSON modeStructured outputsTools
minimaxminimax-m2.5Text196,60865,536$0.30$1.20
TextToolsStructured outputsReasoning
minimaxminimax-m2.1Text192,00065,536$0.30$1.20
TextBasic text
deepseekdeepseek-chatText128,00065,536$0.32$0.89
TextBasic text
deepseekdeepseek-r1Text128,00065,536$0.70$2.50
TextReasoningTools
deepseekdeepseek-chat-v3.1Text128,00065,536$0.27$1.00
TextJSON modeToolsStructured outputs
qwenqwen3-235b-a22b-thinking-2507Text262,14465,536$0.28$2.30
TextReasoningToolsStructured outputs
qwenqwen3-vl-235b-a22b-thinkingText + image262,14465,536$0.45$3.50
Text + imageReasoningToolsStructured outputs
moonshotaikimi-k2.5Text262,14465,536$0.57$2.85
TextReasoningToolsStructured outputs
z-aiglm-5Text262,14465,536$1.00$3.20
TextReasoningToolsStructured outputs
inclusionaillada2.1-miniText16,38416,384$0.00$0.00
TextBasic text
xiaomimimo-v2-flashText262,14432,000$0.10$0.30
TextBasic text

Use /v1/models when you want the current model metadata response directly from the API.

#Sampling parameters

These parameters can be passed alongside model and messages in chat completion requests. Exact support varies by model according to the metadata returned by the API.

ParameterTypeDescription
temperaturenumberSampling temperature. Higher values increase randomness; lower values are more deterministic.
top_pnumberNucleus sampling threshold. Limits candidate tokens to the top probability mass.
top_kintegerLimits sampling to the k most likely next tokens. Supported by most, but not all, reasoning and chat models.
min_pnumberMinimum probability threshold relative to the most likely token. Available on a subset of models.
frequency_penaltynumberPenalizes tokens based on how often they have already appeared in the output.
presence_penaltynumberPenalizes tokens that have appeared at all so far, encouraging topic changes.
repetition_penaltynumberAlternative repetition penalty applied multiplicatively. Values above 1 discourage repetition.
max_tokensintegerMaximum number of output tokens to generate. Supported on models that expose an explicit output cap override.
stopstring | string[]One or more stop sequences. The matching sequence is not included in the output.
seedintegerBest-effort deterministic sampling for the same input and parameter set.
response_formatobjectUse { "type": "json_object" } for JSON mode or { "type": "json_schema", "json_schema": {...} } for structured outputs on models that support them.
tool_choicestring | objectControls tool invocation. Use "auto", "none", or force a specific tool call by name.