Inference API

Models & Pricing

This page documents the public model catalog and pricing for the inference API. Use GET /v1/models when you want the current model metadata response.

#Current catalog

The public catalog currently includes 20 models. Pricing below is shown in dollars per 1 million tokens.

Model	Modalities	Context	Max output	Input $/1M	Output $/1M	Capabilities
deepseekdeepseek-v3.2	Text	163,840	65,536	$0.27	$0.42	TextJSON modeToolsReasoningStructured outputs
z-aiglm-4.7	Text	202,752	65,535	$0.45	$2.10	TextJSON modeStructured outputsToolsReasoning
openaigpt-oss-120b	Text	131,072	65,536	$0.10	$0.75	TextJSON modeStructured outputsToolsReasoning
deepseekdeepseek-r1-0528	Text	163,840	65,536	$0.80	$2.40	TextJSON modeToolsStructured outputsReasoning
deepseekdeepseek-chat-v3-0324	Text	163,840	65,536	$0.27	$1.12	TextJSON modeStructured outputsTools
qwenqwen3-coder-next	Text	262,144	65,536	$0.15	$0.80	TextJSON modeStructured outputsTools
qwenqwen3.5-397b-a17b	Text	262,144	65,536	$0.60	$3.60	TextToolsStructured outputsReasoning
deepseekdeepseek-v3.2-speciale	Text	163,840	65,536	$0.40	$1.20	TextJSON modeReasoningStructured outputs
moonshotaikimi-k2-0905	Text	262,144	262,144	$0.60	$2.50	TextJSON modeStructured outputsTools
minimaxminimax-m2.5	Text	196,608	65,536	$0.30	$1.20	TextToolsStructured outputsReasoning
minimaxminimax-m2.1	Text	192,000	65,536	$0.30	$1.20	TextBasic text
deepseekdeepseek-chat	Text	128,000	65,536	$0.32	$0.89	TextBasic text
deepseekdeepseek-r1	Text	128,000	65,536	$0.70	$2.50	TextReasoningTools
deepseekdeepseek-chat-v3.1	Text	128,000	65,536	$0.27	$1.00	TextJSON modeToolsStructured outputs
qwenqwen3-235b-a22b-thinking-2507	Text	262,144	65,536	$0.28	$2.30	TextReasoningToolsStructured outputs
qwenqwen3-vl-235b-a22b-thinking	Text + image	262,144	65,536	$0.45	$3.50	Text + imageReasoningToolsStructured outputs
moonshotaikimi-k2.5	Text	262,144	65,536	$0.57	$2.85	TextReasoningToolsStructured outputs
z-aiglm-5	Text	262,144	65,536	$1.00	$3.20	TextReasoningToolsStructured outputs
inclusionaillada2.1-mini	Text	16,384	16,384	$0.00	$0.00	TextBasic text
xiaomimimo-v2-flash	Text	262,144	32,000	$0.10	$0.30	TextBasic text

Use /v1/models when you want the current model metadata response directly from the API.

#Sampling parameters

These parameters can be passed alongside model and messages in chat completion requests. Exact support varies by model according to the metadata returned by the API.

Parameter	Type	Description
temperature	number	Sampling temperature. Higher values increase randomness; lower values are more deterministic.
top_p	number	Nucleus sampling threshold. Limits candidate tokens to the top probability mass.
top_k	integer	Limits sampling to the k most likely next tokens. Supported by most, but not all, reasoning and chat models.
min_p	number	Minimum probability threshold relative to the most likely token. Available on a subset of models.
frequency_penalty	number	Penalizes tokens based on how often they have already appeared in the output.
presence_penalty	number	Penalizes tokens that have appeared at all so far, encouraging topic changes.
repetition_penalty	number	Alternative repetition penalty applied multiplicatively. Values above 1 discourage repetition.
max_tokens	integer	Maximum number of output tokens to generate. Supported on models that expose an explicit output cap override.
stop	string \| string[]	One or more stop sequences. The matching sequence is not included in the output.
seed	integer	Best-effort deterministic sampling for the same input and parameter set.
response_format	object	Use { "type": "json_object" } for JSON mode or { "type": "json_schema", "json_schema": {...} } for structured outputs on models that support them.
tool_choice	string \| object	Controls tool invocation. Use "auto", "none", or force a specific tool call by name.