Models & Pricing
This page documents the public model catalog and pricing for the inference API. Use GET /v1/models when you want the current model metadata response.
#Current catalog
The public catalog currently includes 20 models. Pricing below is shown in dollars per 1 million tokens.
| Model | Modalities | Context | Max output | Input $/1M | Output $/1M | Capabilities |
|---|---|---|---|---|---|---|
| deepseekdeepseek-v3.2 | Text | 163,840 | 65,536 | $0.27 | $0.42 | TextJSON modeToolsReasoningStructured outputs |
| z-aiglm-4.7 | Text | 202,752 | 65,535 | $0.45 | $2.10 | TextJSON modeStructured outputsToolsReasoning |
| openaigpt-oss-120b | Text | 131,072 | 65,536 | $0.10 | $0.75 | TextJSON modeStructured outputsToolsReasoning |
| deepseekdeepseek-r1-0528 | Text | 163,840 | 65,536 | $0.80 | $2.40 | TextJSON modeToolsStructured outputsReasoning |
| deepseekdeepseek-chat-v3-0324 | Text | 163,840 | 65,536 | $0.27 | $1.12 | TextJSON modeStructured outputsTools |
| qwenqwen3-coder-next | Text | 262,144 | 65,536 | $0.15 | $0.80 | TextJSON modeStructured outputsTools |
| qwenqwen3.5-397b-a17b | Text | 262,144 | 65,536 | $0.60 | $3.60 | TextToolsStructured outputsReasoning |
| deepseekdeepseek-v3.2-speciale | Text | 163,840 | 65,536 | $0.40 | $1.20 | TextJSON modeReasoningStructured outputs |
| moonshotaikimi-k2-0905 | Text | 262,144 | 262,144 | $0.60 | $2.50 | TextJSON modeStructured outputsTools |
| minimaxminimax-m2.5 | Text | 196,608 | 65,536 | $0.30 | $1.20 | TextToolsStructured outputsReasoning |
| minimaxminimax-m2.1 | Text | 192,000 | 65,536 | $0.30 | $1.20 | TextBasic text |
| deepseekdeepseek-chat | Text | 128,000 | 65,536 | $0.32 | $0.89 | TextBasic text |
| deepseekdeepseek-r1 | Text | 128,000 | 65,536 | $0.70 | $2.50 | TextReasoningTools |
| deepseekdeepseek-chat-v3.1 | Text | 128,000 | 65,536 | $0.27 | $1.00 | TextJSON modeToolsStructured outputs |
| qwenqwen3-235b-a22b-thinking-2507 | Text | 262,144 | 65,536 | $0.28 | $2.30 | TextReasoningToolsStructured outputs |
| qwenqwen3-vl-235b-a22b-thinking | Text + image | 262,144 | 65,536 | $0.45 | $3.50 | Text + imageReasoningToolsStructured outputs |
| moonshotaikimi-k2.5 | Text | 262,144 | 65,536 | $0.57 | $2.85 | TextReasoningToolsStructured outputs |
| z-aiglm-5 | Text | 262,144 | 65,536 | $1.00 | $3.20 | TextReasoningToolsStructured outputs |
| inclusionaillada2.1-mini | Text | 16,384 | 16,384 | $0.00 | $0.00 | TextBasic text |
| xiaomimimo-v2-flash | Text | 262,144 | 32,000 | $0.10 | $0.30 | TextBasic text |
Use /v1/models when you want the current model metadata response directly from the API.
#Sampling parameters
These parameters can be passed alongside model and messages in chat completion requests. Exact support varies by model according to the metadata returned by the API.
| Parameter | Type | Description |
|---|---|---|
| temperature | number | Sampling temperature. Higher values increase randomness; lower values are more deterministic. |
| top_p | number | Nucleus sampling threshold. Limits candidate tokens to the top probability mass. |
| top_k | integer | Limits sampling to the k most likely next tokens. Supported by most, but not all, reasoning and chat models. |
| min_p | number | Minimum probability threshold relative to the most likely token. Available on a subset of models. |
| frequency_penalty | number | Penalizes tokens based on how often they have already appeared in the output. |
| presence_penalty | number | Penalizes tokens that have appeared at all so far, encouraging topic changes. |
| repetition_penalty | number | Alternative repetition penalty applied multiplicatively. Values above 1 discourage repetition. |
| max_tokens | integer | Maximum number of output tokens to generate. Supported on models that expose an explicit output cap override. |
| stop | string | string[] | One or more stop sequences. The matching sequence is not included in the output. |
| seed | integer | Best-effort deterministic sampling for the same input and parameter set. |
| response_format | object | Use { "type": "json_object" } for JSON mode or { "type": "json_schema", "json_schema": {...} } for structured outputs on models that support them. |
| tool_choice | string | object | Controls tool invocation. Use "auto", "none", or force a specific tool call by name. |