API Documentation

Heima x WildMeta LLM Gateway provides an OpenAI-compatible API interface for accessing LLM services. Use your API Key to authenticate and consume credits per request.

Quick Start

1. Get your API Key from the admin
2. Use the key in the Authorization header
3. Send requests to the /v1/chat/completions endpoint
4. Monitor your credits on the Dashboard

Authentication

All API requests require a valid API Key passed via the Authorization header using the Bearer scheme.

HTTP Header

Authorization: Bearer sk-prize-xxxxxxxxxxxx

Your API Key is prefixed with sk-prize-. Keep it confidential and never expose it in client-side code or public repositories.

Important

Your API Key must meet these conditions to work:

Key status is active
Account has credits > 0

Base URL

All API requests are sent to the gateway server. Replace the base URL with your actual deployment address.

Base URL

http://YOUR_SERVER_HOST:PORT

The gateway is OpenAI-compatible, so you can use it as a drop-in replacement by setting the base_url in any OpenAI SDK client.

Chat Completions

Create a chat completion by sending a POST request with your messages.

POST /v1/chat/completions

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	Model ID to use for the completion
`messages`	array	Yes	Array of message objects with `role` and `content`
`stream`	boolean	No	Whether to stream the response. Default: `true`
`temperature`	number	No	Sampling temperature (0-2)
`max_tokens`	integer	No	Maximum number of tokens to generate

Response

JSON Response (non-streaming)

{
  "id": "chatcmpl-xxxx",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  }
}

Streaming

By default, responses are streamed using Server-Sent Events (SSE). Set "stream": false in the request body to receive a complete JSON response instead.

SSE Stream Format

data: {"id":"chatcmpl-xx","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-xx","choices":[{"delta":{"content":"!"},"index":0}]}

data: [DONE]

Supported Models

The gateway proxies requests to the backend LLM provider. Available models depend on the backend configuration. Common models include:

Model	Context Window	Description
`deepseek-chat`	64K tokens	Deepseek V3 chat model

Note

All models share the same credit consumption rate. The actual available models are determined by the backend provider.

Credit Consumption Rules

Each API request consumes credits based on the number of tokens processed. The system tracks both input tokens (your prompt) and output tokens (the model's response) separately.

→

Input Token Rate

0.2 credits / token

Applied to your prompt, system messages, and conversation history

←

Output Token Rate

1.0 credits / token

Applied to the model's generated response

Key Rules

Credits are deducted after each successful request
If your credits reach 0, further requests will be rejected
Failed requests (backend errors) do not consume credits
Streaming and non-streaming requests consume credits the same way
Output tokens cost 5x more than input tokens

Credit Calculation

Credits consumed per request are calculated using this formula:

credits = (input_tokens × 0.2) + (output_tokens × 1.0)

Examples

Scenario	Input Tokens	Output Tokens	Credits Used
Short Q&A	50	100	110 (50×0.2 + 100×1.0)
Long conversation	2,000	500	900 (2000×0.2 + 500×1.0)
Document summary	5,000	300	1,300 (5000×0.2 + 300×1.0)
Code generation	200	1,000	1,040 (200×0.2 + 1000×1.0)

Tip

To save credits, keep your prompts concise and use max_tokens to limit output length. Conversation history counts as input tokens, so shorter conversation context = fewer credits.

Balance Management

You can check your current credit balance on the Dashboard after logging in with your API Key.

What you can see

Credits Balance — Current remaining credits
Usage History — Detailed log of each request with token counts and credits consumed
Account Status — Whether your key is active

To recharge credits, contact your administrator. Credits are added by the admin and reflected immediately in your balance.

cURL

Non-streaming request

curl http://YOUR_SERVER:PORT/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-prize-xxxxxxxxxxxx" \
  -d '{
    "model": "deepseek-chat",
    "stream": false,
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ]
  }'

Streaming request

curl http://YOUR_SERVER:PORT/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-prize-xxxxxxxxxxxx" \
  -N \
  -d '{
    "model": "deepseek-chat",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Tell me a joke"}
    ]
  }'

Python (OpenAI SDK)

pip install openai

from openai import OpenAI

client = OpenAI(
    api_key="sk-prize-xxxxxxxxxxxx",
    base_url="http://YOUR_SERVER:PORT/v1"
)

# Non-streaming
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    stream=False
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "user", "content": "Tell me a story"}
    ],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Node.js (OpenAI SDK)

npm install openai

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-prize-xxxxxxxxxxxx',
  baseURL: 'http://YOUR_SERVER:PORT/v1',
});

// Non-streaming
const response = await client.chat.completions.create({
  model: 'deepseek-chat',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' },
  ],
  stream: false,
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
  model: 'deepseek-chat',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Error Codes

HTTP Status	Error Type	Description	Solution
`401`	`invalid_api_key`	API Key is missing, invalid, or disabled	Check your API Key and ensure it's active
`401`	`insufficient_credits`	Account credits have been exhausted	Contact admin to recharge credits
`4xx/5xx`	`backend_error`	The backend LLM provider returned an error	Check your request parameters and try again
`502`	`proxy_error`	Gateway could not connect to the backend	Wait and retry; contact admin if persistent

Error Response Format

JSON

{
  "error": {
    "message": "Insufficient credits",
    "type": "insufficient_credits",
    "code": "credits_exhausted"
  }
}