Inference API

API Reference

Rungate's API is fully compatible with OpenAI's client libraries, making it easy to use open-source models in your existing applications.

#Access methods

Rungate supports three ways to access the inference API: account-scoped API keys, x402 pay-per-request, and MPP pay-per-request. Environments can advertise one or both payment protocols when a request arrives without standard authentication.

Method	Family	Credential	Billing model	Best fit
API keys	Account-based	Authorization: Bearer <RUNGATE_API_KEY>	Account credits	Server apps, stable account auth, account-based billing
x402	Wallet-based	payment-signature	Wallet-backed pay per request	Clients that already support x402 challenge/response
MPP	Wallet-based	Authorization: Payment ...	Wallet-backed pay per request	MPP-capable clients and wallets

Start with the access methods overview if you need help choosing between API keys, x402, and MPP. If you are using OpenClaw and want automatic model routing, see the LLM router guide.

#Configuration

To start using Rungate with OpenAI's client libraries, pass your Rungate API key and change the SDK base URL to https://api.rungate.ai/v1. For raw HTTP requests, use https://api.rungate.ai with /v1/... paths. You can find your API key in your account settings.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_RUNGATE_API_KEY",
    base_url="https://api.rungate.ai/v1",
)

Store your API key in an environment variable (RUNGATE_API_KEY) rather than hardcoding it.

#Chat completions

Once your client is configured, you can query any of our open-source models. For example, here's a chat completion with DeepSeek V3.2.

response = client.chat.completions.create(
    model="deepseek/deepseek-v3.2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
)

print(response.choices[0].message.content)

Output

The capital of France is Paris.

See the Models & Pricing page for all available model IDs.

#Streaming

You can stream responses back using OpenAI's streaming interface. Pass stream=True to receive server-sent events as the model generates tokens.

stream = client.chat.completions.create(
    model="qwen/qwen3-coder-next",
    messages=[{"role": "user", "content": "Count to 5"}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content or ""
    print(content, end="")

Output

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"1"},"index":0}]} data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":", 2"},"index":0}]} data: {"id":"...","choices":[],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20}} data: [DONE]

The final chunk before [DONE] includes a usage field with token counts.

#Multimodal input

Models that support vision accept image inputs alongside text. Pass a content array instead of a plain string.

response = client.chat.completions.create(
    model="qwen/qwen3-vl-235b-a22b-thinking",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/photo.jpg"
                    },
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

Supported formats: image/jpeg, image/png, image/webp, image/gif. Base64 data URIs and public URLs are both accepted. See Models & Pricing for which models support vision.

#Endpoints

Method	Path	Description
POST	/v1/chat/completions	Create a chat completion
GET	/v1/models	List available models

These paths are also available without the /v1 prefix (e.g. /chat/completions, /models).

#Usage tracking

Every response includes a usage object with token counts. For streaming responses it appears in the final chunk before [DONE].

{
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

#Rate limiting

Throughput limits vary per model — see the Models & Pricing page for details.

#Error codes

Errors follow the standard OpenAI error format:

{
  "error": {
    "code": 401,
    "message": "Invalid API key"
  }
}

Code	Meaning
400	Bad request — invalid or missing parameters
402	Payment required — satisfy an x402 or MPP challenge, or use an API key
401	Unauthorized — missing or invalid API key
404	Model not found
429	Rate limited — request exceeds concurrency limits
500	Internal server error