API Documentation

Heima x WildMeta LLM Gateway provides an OpenAI-compatible API interface for accessing LLM services. Use your API Key to authenticate and consume credits per request.

Quick Start

1. Get your API Key from the admin
2. Use the key in the Authorization header
3. Send requests to the /v1/chat/completions endpoint
4. Monitor your credits on the Dashboard

Authentication

All API requests require a valid API Key passed via the Authorization header using the Bearer scheme.

HTTP Header
Authorization: Bearer sk-prize-xxxxxxxxxxxx

Your API Key is prefixed with sk-prize-. Keep it confidential and never expose it in client-side code or public repositories.

Important

Your API Key must meet these conditions to work:

  • Key status is active
  • Account has credits > 0

Base URL

All API requests are sent to the gateway server. Replace the base URL with your actual deployment address.

Base URL
http://YOUR_SERVER_HOST:PORT

The gateway is OpenAI-compatible, so you can use it as a drop-in replacement by setting the base_url in any OpenAI SDK client.

Chat Completions

Create a chat completion by sending a POST request with your messages.

POST /v1/chat/completions

Request Body

Parameter Type Required Description
model string Yes Model ID to use for the completion
messages array Yes Array of message objects with role and content
stream boolean No Whether to stream the response. Default: true
temperature number No Sampling temperature (0-2)
max_tokens integer No Maximum number of tokens to generate

Response

JSON Response (non-streaming)
{
  "id": "chatcmpl-xxxx",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  }
}

Streaming

By default, responses are streamed using Server-Sent Events (SSE). Set "stream": false in the request body to receive a complete JSON response instead.

SSE Stream Format
data: {"id":"chatcmpl-xx","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-xx","choices":[{"delta":{"content":"!"},"index":0}]}

data: [DONE]

Supported Models

The gateway proxies requests to the backend LLM provider. Available models depend on the backend configuration. Common models include:

Model Context Window Description
deepseek-chat 64K tokens Deepseek V3 chat model
Note

All models share the same credit consumption rate. The actual available models are determined by the backend provider.

Credit Consumption Rules

Each API request consumes credits based on the number of tokens processed. The system tracks both input tokens (your prompt) and output tokens (the model's response) separately.

Input Token Rate
0.2 credits / token
Applied to your prompt, system messages, and conversation history
Output Token Rate
1.0 credits / token
Applied to the model's generated response
Key Rules
  • Credits are deducted after each successful request
  • If your credits reach 0, further requests will be rejected
  • Failed requests (backend errors) do not consume credits
  • Streaming and non-streaming requests consume credits the same way
  • Output tokens cost 5x more than input tokens

Credit Calculation

Credits consumed per request are calculated using this formula:

credits = (input_tokens × 0.2) + (output_tokens × 1.0)

Examples

Scenario Input Tokens Output Tokens Credits Used
Short Q&A 50 100 110 (50×0.2 + 100×1.0)
Long conversation 2,000 500 900 (2000×0.2 + 500×1.0)
Document summary 5,000 300 1,300 (5000×0.2 + 300×1.0)
Code generation 200 1,000 1,040 (200×0.2 + 1000×1.0)
Tip

To save credits, keep your prompts concise and use max_tokens to limit output length. Conversation history counts as input tokens, so shorter conversation context = fewer credits.

Balance Management

You can check your current credit balance on the Dashboard after logging in with your API Key.

What you can see

  • Credits Balance — Current remaining credits
  • Usage History — Detailed log of each request with token counts and credits consumed
  • Account Status — Whether your key is active

To recharge credits, contact your administrator. Credits are added by the admin and reflected immediately in your balance.

cURL

Non-streaming request
curl http://YOUR_SERVER:PORT/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-prize-xxxxxxxxxxxx" \
  -d '{
    "model": "deepseek-chat",
    "stream": false,
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ]
  }'
Streaming request
curl http://YOUR_SERVER:PORT/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-prize-xxxxxxxxxxxx" \
  -N \
  -d '{
    "model": "deepseek-chat",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Tell me a joke"}
    ]
  }'

Python (OpenAI SDK)

pip install openai
from openai import OpenAI

client = OpenAI(
    api_key="sk-prize-xxxxxxxxxxxx",
    base_url="http://YOUR_SERVER:PORT/v1"
)

# Non-streaming
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    stream=False
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "user", "content": "Tell me a story"}
    ],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Node.js (OpenAI SDK)

npm install openai
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-prize-xxxxxxxxxxxx',
  baseURL: 'http://YOUR_SERVER:PORT/v1',
});

// Non-streaming
const response = await client.chat.completions.create({
  model: 'deepseek-chat',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' },
  ],
  stream: false,
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
  model: 'deepseek-chat',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Error Codes

HTTP Status Error Type Description Solution
401 invalid_api_key API Key is missing, invalid, or disabled Check your API Key and ensure it's active
401 insufficient_credits Account credits have been exhausted Contact admin to recharge credits
4xx/5xx backend_error The backend LLM provider returned an error Check your request parameters and try again
502 proxy_error Gateway could not connect to the backend Wait and retry; contact admin if persistent

Error Response Format

JSON
{
  "error": {
    "message": "Insufficient credits",
    "type": "insufficient_credits",
    "code": "credits_exhausted"
  }
}