API Documentation
Heima x WildMeta LLM Gateway provides an OpenAI-compatible API interface for accessing LLM services. Use your API Key to authenticate and consume credits per request.
1. Get your API Key from the admin
2. Use the key in the Authorization header
3. Send requests to the /v1/chat/completions endpoint
4. Monitor your credits on the Dashboard
Authentication
All API requests require a valid API Key passed via the Authorization header using the Bearer scheme.
Authorization: Bearer sk-prize-xxxxxxxxxxxx
Your API Key is prefixed with sk-prize-. Keep it confidential and never expose it in client-side code or public repositories.
Your API Key must meet these conditions to work:
- Key status is active
- Account has credits > 0
Base URL
All API requests are sent to the gateway server. Replace the base URL with your actual deployment address.
http://YOUR_SERVER_HOST:PORT
The gateway is OpenAI-compatible, so you can use it as a drop-in replacement by setting the base_url in any OpenAI SDK client.
Chat Completions
Create a chat completion by sending a POST request with your messages.
/v1/chat/completions
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model ID to use for the completion |
messages |
array | Yes | Array of message objects with role and content |
stream |
boolean | No | Whether to stream the response. Default: true |
temperature |
number | No | Sampling temperature (0-2) |
max_tokens |
integer | No | Maximum number of tokens to generate |
Response
{
"id": "chatcmpl-xxxx",
"object": "chat.completion",
"created": 1700000000,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
}
}
Streaming
By default, responses are streamed using Server-Sent Events (SSE). Set "stream": false in the request body to receive a complete JSON response instead.
data: {"id":"chatcmpl-xx","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"id":"chatcmpl-xx","choices":[{"delta":{"content":"!"},"index":0}]}
data: [DONE]
Supported Models
The gateway proxies requests to the backend LLM provider. Available models depend on the backend configuration. Common models include:
| Model | Context Window | Description |
|---|---|---|
deepseek-chat |
64K tokens | Deepseek V3 chat model |
All models share the same credit consumption rate. The actual available models are determined by the backend provider.
Credit Consumption Rules
Each API request consumes credits based on the number of tokens processed. The system tracks both input tokens (your prompt) and output tokens (the model's response) separately.
- Credits are deducted after each successful request
- If your credits reach 0, further requests will be rejected
- Failed requests (backend errors) do not consume credits
- Streaming and non-streaming requests consume credits the same way
- Output tokens cost 5x more than input tokens
Credit Calculation
Credits consumed per request are calculated using this formula:
credits = (input_tokens × 0.2) + (output_tokens × 1.0)
Examples
| Scenario | Input Tokens | Output Tokens | Credits Used |
|---|---|---|---|
| Short Q&A | 50 | 100 | 110 (50×0.2 + 100×1.0) |
| Long conversation | 2,000 | 500 | 900 (2000×0.2 + 500×1.0) |
| Document summary | 5,000 | 300 | 1,300 (5000×0.2 + 300×1.0) |
| Code generation | 200 | 1,000 | 1,040 (200×0.2 + 1000×1.0) |
To save credits, keep your prompts concise and use max_tokens to limit output length. Conversation history counts as input tokens, so shorter conversation context = fewer credits.
Balance Management
You can check your current credit balance on the Dashboard after logging in with your API Key.
What you can see
- Credits Balance — Current remaining credits
- Usage History — Detailed log of each request with token counts and credits consumed
- Account Status — Whether your key is active
To recharge credits, contact your administrator. Credits are added by the admin and reflected immediately in your balance.
cURL
curl http://YOUR_SERVER:PORT/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-prize-xxxxxxxxxxxx" \
-d '{
"model": "deepseek-chat",
"stream": false,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
}'
curl http://YOUR_SERVER:PORT/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-prize-xxxxxxxxxxxx" \
-N \
-d '{
"model": "deepseek-chat",
"stream": true,
"messages": [
{"role": "user", "content": "Tell me a joke"}
]
}'
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="sk-prize-xxxxxxxxxxxx",
base_url="http://YOUR_SERVER:PORT/v1"
)
# Non-streaming
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
stream=False
)
print(response.choices[0].message.content)
# Streaming
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "user", "content": "Tell me a story"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Node.js (OpenAI SDK)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-prize-xxxxxxxxxxxx',
baseURL: 'http://YOUR_SERVER:PORT/v1',
});
// Non-streaming
const response = await client.chat.completions.create({
model: 'deepseek-chat',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' },
],
stream: false,
});
console.log(response.choices[0].message.content);
// Streaming
const stream = await client.chat.completions.create({
model: 'deepseek-chat',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
Error Codes
| HTTP Status | Error Type | Description | Solution |
|---|---|---|---|
401 |
invalid_api_key |
API Key is missing, invalid, or disabled | Check your API Key and ensure it's active |
401 |
insufficient_credits |
Account credits have been exhausted | Contact admin to recharge credits |
4xx/5xx |
backend_error |
The backend LLM provider returned an error | Check your request parameters and try again |
502 |
proxy_error |
Gateway could not connect to the backend | Wait and retry; contact admin if persistent |
Error Response Format
{
"error": {
"message": "Insufficient credits",
"type": "insufficient_credits",
"code": "credits_exhausted"
}
}