Inference API

API Reference

Rungate's API is fully compatible with OpenAI's client libraries, making it easy to use open-source models in your existing applications.

#Access methods

Rungate supports three ways to access the inference API: account-scoped API keys, x402 pay-per-request, and MPP pay-per-request. Environments can advertise one or both payment protocols when a request arrives without standard authentication.

MethodFamilyCredentialBilling modelBest fit
API keysAccount-basedAuthorization: Bearer <RUNGATE_API_KEY>Account creditsServer apps, stable account auth, account-based billing
x402Wallet-basedpayment-signatureWallet-backed pay per requestClients that already support x402 challenge/response
MPPWallet-basedAuthorization: Payment ...Wallet-backed pay per requestMPP-capable clients and wallets
Start with the access methods overview if you need help choosing between API keys, x402, and MPP. If you are using OpenClaw and want automatic model routing, see the LLM router guide.

#Configuration

To start using Rungate with OpenAI's client libraries, pass your Rungate API key and change the SDK base URL to https://api.rungate.ai/v1. For raw HTTP requests, use https://api.rungate.ai with /v1/... paths. You can find your API key in your account settings.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_RUNGATE_API_KEY",
    base_url="https://api.rungate.ai/v1",
)
Store your API key in an environment variable (RUNGATE_API_KEY) rather than hardcoding it.

#Chat completions

Once your client is configured, you can query any of our open-source models. For example, here's a chat completion with DeepSeek V3.2.

response = client.chat.completions.create(
    model="deepseek/deepseek-v3.2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
)

print(response.choices[0].message.content)
Output
The capital of France is Paris.

See the Models & Pricing page for all available model IDs.

#Streaming

You can stream responses back using OpenAI's streaming interface. Pass stream=True to receive server-sent events as the model generates tokens.

stream = client.chat.completions.create(
    model="qwen/qwen3-coder-next",
    messages=[{"role": "user", "content": "Count to 5"}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content or ""
    print(content, end="")
Output
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"1"},"index":0}]} data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":", 2"},"index":0}]} data: {"id":"...","choices":[],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20}} data: [DONE]

The final chunk before [DONE] includes a usage field with token counts.

#Multimodal input

Models that support vision accept image inputs alongside text. Pass a content array instead of a plain string.

response = client.chat.completions.create(
    model="qwen/qwen3-vl-235b-a22b-thinking",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/photo.jpg"
                    },
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

Supported formats: image/jpeg, image/png, image/webp, image/gif. Base64 data URIs and public URLs are both accepted. See Models & Pricing for which models support vision.

#Endpoints

MethodPathDescription
POST/v1/chat/completionsCreate a chat completion
GET/v1/modelsList available models

These paths are also available without the /v1 prefix (e.g. /chat/completions, /models).

#Usage tracking

Every response includes a usage object with token counts. For streaming responses it appears in the final chunk before [DONE].

{
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

#Rate limiting

Throughput limits vary per model — see the Models & Pricing page for details.

#Error codes

Errors follow the standard OpenAI error format:

{
  "error": {
    "code": 401,
    "message": "Invalid API key"
  }
}
CodeMeaning
400Bad request — invalid or missing parameters
402Payment required — satisfy an x402 or MPP challenge, or use an API key
401Unauthorized — missing or invalid API key
404Model not found
429Rate limited — request exceeds concurrency limits
500Internal server error