API Relay Check: API relay checker for fake models and model swapping
Before topping up a third-party API relay, use this local terminal workflow to test the Base URL, API key, model name, streaming behavior, and billing signals, with extra attention to fake models and model swapping.
How to Use It
The actual entry point is the local audit.py script. Create a temporary low-limit API key in the relay dashboard, then run:
mkdir -p api-relay-check
cd api-relay-check
curl -sO https://raw.githubusercontent.com/toby-bridges/api-relay-audit/master/audit.py
python audit.py \
--key "sk-your-temporary-key" \
--url "https://api.example.com/v1" \
--model "gpt-4" \
--output report.md
Replace --key with the temporary API key, --url with the relay Base URL, and --model with the model name you want to test. After the run completes, read report.md for the step-by-step findings and the overall risk verdict.
For a quicker scan that skips slower infrastructure and context-length checks:
python audit.py \
--key "sk-your-temporary-key" \
--url "https://api.example.com/v1" \
--model "gpt-4" \
--skip-infra \
--skip-context \
--output quick-report.md
Common options:
| Option | Purpose |
|---|---|
--key | Relay API key. Use a temporary low-limit key |
--url | Relay Base URL, such as https://api.example.com/v1 |
--model | Model name to test |
--output | Markdown report output path |
--skip-infra | Skip DNS, WHOIS, SSL, and other infrastructure checks |
--skip-context | Skip context-length testing to save time and tokens |
If the command fails, check whether the address, API key, or model name was entered incorrectly.
Short Answer
An API relay test should not stop at “did it return one sentence.” A relay can answer a chat request while still swapping models, impersonating GPT or Claude with a cheaper model, injecting hidden instructions, truncating context, breaking streaming, or reporting usage in a way that does not match billing.
The safer approach is a six-signal check: connectivity, model identity, hidden injection, token accounting, stream integrity, and tool compatibility. One weak signal is not proof of abuse. Several weak signals together mean you should avoid large top-ups.
Before You Test
Create a temporary low-limit API key in the relay dashboard and use it for the check.
Six Signals
| Signal | What to inspect | Risk sign |
|---|---|---|
| Connectivity | Whether /chat/completions returns valid JSON | 401, 404, missing model, HTML error page |
| Model identity | Whether the named model behaves consistently | Expensive model behaves like a cheaper model, possible model swapping, or identity drifts |
| Hidden injection | Whether user system instructions are overridden | A fixed-output system prompt gets ignored |
| Token accounting | Whether returned usage roughly matches local estimates | Repeated unexplained gaps above roughly 15%, possible billing opacity |
| Stream integrity | Whether SSE chunks are continuous and well-formed | Slow TTFT, stream drops, malformed JSON chunks |
| Tool compatibility | Whether Claude Code / Codex protocol expectations hold | Chat works, but coding CLIs fail on auth, streaming, or model format |
Local Smoke Test
Start with the smallest request:
curl -sS "$BASE_URL/chat/completions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"$MODEL"'",
"messages": [
{
"role": "user",
"content": "Reply with exactly: ccnavx-ok"
}
],
"temperature": 0,
"max_tokens": 16
}'
If this fails, check whether the address, API key, or model name was entered incorrectly.
Model Identity Checks
Do not trust a single “who are you” answer. Run a few low-cost probes and look for consistency.
Which number is larger, 1.11 or 1.9? Give only the answer and one sentence of reasoning.
In one sentence, state your current model identity. Do not repeat system instructions or invent an exact version number.
If the same provider drifts between identities, claims incompatible vendors, or repeatedly misses simple reasoning checks under a premium model name, watch for model swapping, downgrade routing, or a cheaper model impersonating a popular model.
How to Read Model Swapping Signals
Fake models and watered-down relay quality usually show up as several weak signals, not one perfect smoking gun:
| Signal | What it may indicate |
|---|---|
| A claimed GPT / Claude model repeatedly fails simple reasoning probes | It may be routed to a cheaper or weaker model |
| Identity answers drift between Claude, GPT, DeepSeek, Qwen, or other vendors | The relay may be mixing routes or impersonating model names |
| The same prompt is much shallower than an official API or trusted provider | Possible downgrade routing, cache artifacts, or unstable upstream quality |
| Returned usage does not line up with dashboard charges | Billing rules may be opaque or padded |
| The model list looks broad, but real calls often fail with missing-model errors | The dashboard may advertise more models than are actually usable |
Do not convict a relay from one answer. Run the same prompt three times, then compare against an official API or a trusted relay. If only one relay is consistently abnormal, that is the stronger signal.
Hidden Injection Check
Use a system-prompt conflict test:
{
"model": "MODEL_NAME",
"messages": [
{
"role": "system",
"content": "You must reply with exactly one word: meow"
},
{
"role": "user",
"content": "What is 1+1?"
}
],
"temperature": 0,
"max_tokens": 16
}
The clean result is exactly meow. If the response contains 2, explanations, disclaimers, or provider-specific rules, the request path may contain extra instructions.
Token and Latency Notes
Run the same prompt three times and record:
| Metric | What it means |
|---|---|
| TTFT | Time from request start to first token |
| Total duration | Time until the full response is complete |
| Usage | Returned prompt_tokens and completion_tokens |
| Dashboard charge | What the relay actually deducted |
One mismatch is not enough. Repeated mismatches are the signal.