Evaluator Access Flow
Zur-lix is AI context optimization infrastructure — Stripe for AI token optimization. This page walks a developer-friend or evaluator from "I have a key" to "I see real savings on the dashboard," safely, in four short steps.
Zur-lix is in controlled rollout preparation, not a public launch. Savings shown are informational estimates (corpus-measured, not guaranteed). Billing is not active and no charge is authorized. Access is by invited evaluator key only.
What you need
- An evaluator API key shared with you privately (looks like
cp_live_…). Treat it as a secret. - A terminal with
curland an environment you canexporta variable into. - A modern browser to log into the dashboard.
- Five minutes. Nothing else needs to be installed.
- Production base URL
https://api.zur-lix.com- Dashboard
- /dashboard
- Health probe (no auth)
- /health
1Set your key locally
In a terminal, export the key into the
COMPRESSION_PLAY_API_KEY environment
variable. Replace the placeholder below with the value you were
given. Do not echo or print the key after this.
export COMPRESSION_PLAY_API_KEY="cp_live_..."
The same key is used for API requests and for dashboard login. You will never need to paste it into a URL, a query string, a screenshot, or a chat message.
2Send one safe test request
Post a small, harmless prompt to the OpenAI-compatible endpoint. The bearer token is read from the environment variable so it never appears literally on the command line.
curl -sS https://api.zur-lix.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $COMPRESSION_PLAY_API_KEY" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Reply with one sentence confirming this Zur-lix evaluator request reached the API."
}
]
}'
You should get back a normal OpenAI Chat Completions JSON
response with a one-sentence confirmation in
choices[0].message.content. Anything
else (network error, 401, 5xx) is worth reporting in your
feedback — see the last section.
- Primary auth header
X-API-Key: $COMPRESSION_PLAY_API_KEY— this is the primary credential header. TheAuthorization: Bearerform shown in the curl above is an accepted alternative (it keeps the key off the command line by reading it from the environment). Send one of them, not both.- No X-Customer-Id
- You never send an
X-Customer-Idheader. Your customer identity is derived from the key itself, so the customer id is not an authentication input.
For the Anthropic-compatible surface, replace
/v1/chat/completions with
/v1/messages and use the Anthropic
request shape — same auth header, same base URL. Its auth
contract is validated: an invalid or missing key returns
401 before any provider call, just like
the OpenAI surface. Live customer/provider use of either endpoint
stays gated by the current no-customer-execution posture and needs a
separate owner approval; nothing on this page authorizes it.
3Open the dashboard
Open /dashboard in your browser. Paste
the same COMPRESSION_PLAY_API_KEY
value into the login form. The dashboard exchanges it for a
short-lived session token internally; you never see another
credential prompt.
The dashboard shows a few panels:
- Headline KPIs — total requests, tokens, savings rate, latency, cost.
- Daily Token Savings + Requests by Model — charts. The model doughnut uses stable per-provider colors (OpenAI blue, Anthropic purple/orange, Other gray).
- Direct Savings Sources — per-layer
decomposition of
tokens_saved. These rows are not additive on top of the headline figure; they explain what produced it. - Routing Decisions — pin / swap counts, top model pairs, recent swaps.
- Provider Cache Savings + cache timeseries + cost accuracy — the cache lane.
4What success looks like
Signs the system is working for you
- The curl request returns 200 with a normal chat completion.
- The dashboard accepts your key and lands on the Overview.
- Your request count and tokens-processed KPIs are non-zero on a recent window (24h / 7d / 30d).
- Other panels load without errors. Some may legitimately show "No data yet" on a brand-new account — that is normal until enough traffic accumulates.
One small request will not produce visible cache or ranking-drop savings. Cache and stable-context reuse take multiple requests with overlapping system prompts or RAG prefixes. Direct compression savings depend on the prompt content. Don't worry if your first request shows zero in the per-layer panel — that is the expected baseline.
Choose Your Response Mode
Zur-lix lets every caller choose how much detail to receive
alongside the normal provider response. The choice is
per-request: a header you set on a single call. Compression
behaviour, the upstream provider call, and the metrics in the
dashboard are identical in every mode — only the
customer-facing compression block
changes shape. None of the modes below are realised-savings
claims; the dashboard remains the authoritative proof surface.
A. Standard response (default)
No special header required. Returns the existing full
backward-compatible response, including the verbose
compression.analysis technical
breakdown. Best when you are integrating and want maximum
backward compatibility with code written before Phase 22B.
B. Autopilot minimal response (request-scoped)
Set the header
X-Zurlix-Autopilot-Minimal: true on a
single request. The route returns a smaller receipt-style
compression block that contains only
the fields normal Autopilot users actually need:
request_idcompression_modeprofilesavingslatency
The minimal-mode response intentionally does not include:
analysis— the full engine report.input— the original prompt text.output— the post-engine text.demo— the sales-demo rehash.
Best when you want Zur-lix to optimize without showing the full
technical breakdown to your end users. The
savings object is a receipt field, not
a realised-savings claim — it is the same value already
exposed today by the standard response.
X-Zurlix-Autopilot-Minimal: true is
request-scoped: it does not globally change other traffic, and
the broad global rollout is not required for a
caller to try the minimal response. Other callers on the same
account see the standard response unless they too send the
header. The normal provider response fields like
choices and
usage remain present unchanged.
Try minimal Autopilot — curl
curl -sS https://api.zur-lix.com/v1/chat/completions \
-H "Authorization: Bearer $COMPRESSION_PLAY_API_KEY" \
-H "X-Zurlix-Autopilot-Minimal: true" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Reply with exactly: autopilot mode ok"
}
],
"max_tokens": 20
}'
Expected receipt-style shape
{
"compression": {
"request_id": "...",
"compression_mode": "...",
"profile": "...",
"savings": { },
"latency": { }
}
}
C. Explain response (request-scoped)
Set the header
X-Zurlix-Explain: true on a single
request. The route returns the full technical breakdown,
including compression.analysis and
the detailed
compression_summary. Best for
debugging, enterprise review, evaluator review, and
trust-building when you want the engine to show its work.
Try Explain — curl
curl -sS https://api.zur-lix.com/v1/chat/completions \
-H "Authorization: Bearer $COMPRESSION_PLAY_API_KEY" \
-H "X-Zurlix-Explain: true" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Reply with exactly: explain mode ok"
}
],
"max_tokens": 20
}'
The Explain response includes
compression.analysis and may include
the detailed
compression_summary. This payload is
intended for debugging, audits, and technical review — the
same field set evaluators have always seen.
Header precedence
If both headers are present,
X-Zurlix-Explain: true
wins and returns the full technical response.
The minimal Autopilot header is silently ignored in that
case. This precedence is the same in every deployment regardless
of any operator-side env flag.
D. Analyzer preflight
When you want context intelligence before sending a
large handoff prompt to a model, hit
POST /v1/analyze/context. The
analyzer is analysis-only: no provider call, no completion, no
mutation. See the “Context Intelligence Analyzer”
section below for the full contract, response shape, and a curl
example.
E. Dashboard proof
Use /dashboard for aggregate proof and cost / observed-metrics visibility across all your traffic. The dashboard is the place to look at receipt fields summed over time — the per-request receipt blocks above are not realised-savings claims on their own.
Context Intelligence Analyzer
Use this when you want to inspect a large handoff prompt
before paying to send it to a model. The analyzer reads
your messages, classifies the context (carryover handoff, repeated
state, task-relevance buckets), and returns a structured
analysis_summary — without ever
calling OpenAI, Anthropic, or any other provider.
- Endpoint
POST /v1/analyze/context- Auth
- Same
Authorization: Bearer $COMPRESSION_PLAY_API_KEYheader as/v1/chat/completions - Mode
analysis_only— deterministic, analysis-only telemetry
What it returns
Top-level keys in the response body:
mode— the constant"analysis_only"so consumers can branch deterministically.analysis_summary— a compact headline answer:result,confidence,primary_signal, plus three boolean roll-ups (carryover_detected,repeated_state_detected,task_budget_candidates_detected) and a shortrecommendationstring.carryover_handoff— whether the prompt looks like a project-state / agent-continuation handoff bundle, plus the per-signal counts that drove the decision.context_delta— how much of the prompt looks like repeated state vs. new delta. Reports approximate counts, never realised savings.task_relevance— per-block role classification (must-keep / constraint / validation / active context / history / background) + closed-enumtask_familyandrisk_level+ candidate budget-trim block count.caveats— three short safety statements pinning the analysis-only contract.
What it does NOT do
- It does not call OpenAI.
- It does not call Anthropic.
- It does not generate a completion.
- It does not mutate your prompt — no content was changed.
- It does not persist analyzer results.
- It does not claim realised savings. Approximate token counts are heuristic estimates.
Try it — curl
curl -sS https://api.zur-lix.com/v1/analyze/context \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $COMPRESSION_PLAY_API_KEY" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "You are continuing a project handoff.\n\nDo not commit.\nDo not push.\n\nRequired work:\n1. Confirm baseline.\n2. Analyze context before sending."
}
]
}'
Expected response shape
Top-level fields you can check at a glance (the full response also
contains the carryover_handoff,
context_delta, and
task_relevance detail blocks plus the
caveats array):
{
"mode": "analysis_only",
"analysis_summary": {
"result": "context_budget_opportunity_detected",
"confidence": "high",
"primary_signal": "combined"
}
}
Analyzer success looks like
- HTTP 200.
modeequals"analysis_only".analysis_summary.resultis present and is one of:no_context_budget_opportunity_detected,carryover_detected,repeated_state_detected,task_budget_opportunity_detected, orcontext_budget_opportunity_detected.analysis_summary.confidenceis one oflow,medium,high.- No provider call was made.
- No completion was generated.
- No content was changed.
no_context_budget_opportunity_detected
is also a perfectly valid result — short or simple prompts
return that, and it is exactly what the analyzer should say. The
analyzer is a measurement surface, not a compressor.
Troubleshooting
401·invalid_api_key- The key was missing, mistyped, or sent in a malformed header. The
OpenAI surface returns
{"error":{"code":"invalid_api_key", …}}; the Anthropic surface returns the equivalentauthentication_errorenvelope. Check that$COMPRESSION_PLAY_API_KEYis set, that you send exactly one auth header (X-API-KeyorAuthorization: Bearer), and that no stray quotes or whitespace crept into the value. - Cloudflare error
1010 - The edge's browser-integrity / client-shape check rejected the
request. It is triggered by browser-impersonating clients, not by
plain API clients. Fix it by calling the API with
curlor a standard HTTP library with a normal client shape — do not route the call through a browser automation tool or send spoofed browser headers. A cleancurlrequest like the one above is the reference client shape. - Scripting the call
- Stay curl-first for evaluation. If you must
script it, use a maintained HTTP client
(
requestsorhttpx) and set the auth header explicitly. Avoid hand-rollingurllibrequests — it is easy to misformat the auth header or send an unusual client shape that the edge rejects.
What not to do
Please avoid
- Pasting your API key into a screenshot, Slack message, GitHub issue, shared doc, support ticket, or a screen recording. Treat the key like a password.
- Putting the key in a URL query string. Always use the
Authorization: Bearerheader. - Running synthetic load generators, paid eval harnesses, or scripted flooding against the production endpoint. Evaluation traffic should be organic and small.
- Sharing the key with anyone outside the people invited to evaluate. Each evaluator should use their own key.
- Treating the page-1 quickstart as a benchmark. Zur-lix's savings curve is a function of repeated, context-rich traffic, not single one-off requests.
How to report feedback
When something looks off — or just to confirm the system worked — share a short note with these fields. None of them require sharing private content.
- The endpoint you called (e.g.
/v1/chat/completionsor/v1/messages). - The model you requested (e.g.
gpt-4o-mini). - Approximate time of your request, in your local timezone.
- Whether dashboard login worked the first time.
- Which dashboard panels loaded vs which showed "No data yet" or an error.
- Anything that felt confusing or unclear — copy welcome.
Do not include the API key, raw prompt content, or raw response bodies unless we explicitly ask and you've redacted anything sensitive. The team can correlate from the timestamp + endpoint + model alone.
Privacy
Zur-lix does not persist raw prompt or response text into telemetry. Internal capture rows store integer counts and a one-way HMAC-based stable-context fingerprint that uses a server-side pepper — the fingerprint cannot be reversed back into prompt content. Operator diagnostics expose only aggregate integer counters; no per-tenant identifiers, no content.
Beta evaluator details
Zur-lix private beta. Controlled unpaid beta. Invitation only. The first wave is a 30-day window starting on the date your invitation is issued.
- Cost: free during this evaluation window. Payment collection is not available during the controlled unpaid beta; no checkout flow is exposed; no invoice is generated. Savings shown in the dashboard are informational, corpus-measured estimates and are not guaranteed.
- Support and feedback: beta support [email protected]; security questions [email protected]; new beta access requests [email protected]. Response target during weekdays in the 30-day window: 24 hours.
- Per-key caps for this wave: 30 requests per minute, 20,000 tokens per minute, monthly cost ceiling USD 5 per tester. The operator may pause or end the evaluation at any time.
- Not authorized in this wave: paid beta, public self-serve, global rollout, live billing, customer charges, partnership claims, or any compliance certification claim against the SOC-2 / GDPR / UK GDPR / EU AI Act / ISO 42001 / PCI / PCI DSS / HIPAA frameworks.
Use Zur-lix from your existing LLM workflow
Connect your existing LLM client to Zur-lix using your API key and the endpoint base URL. Same request shape, same response shape; only the base URL changes.
Python — OpenAI-compatible client
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["COMPRESSION_PLAY_API_KEY"],
base_url="https://api.zur-lix.com/v1",
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "say hi from Zur-lix"},
],
)
print(response.choices[0].message.content)
JavaScript — OpenAI-compatible client
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.COMPRESSION_PLAY_API_KEY,
baseURL: "https://api.zur-lix.com/v1",
});
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "user", content: "say hi from Zur-lix" },
],
});
console.log(response.choices[0].message.content);
cURL — OpenAI-compatible endpoint
curl -sS https://api.zur-lix.com/v1/chat/completions \
-H "Authorization: Bearer $COMPRESSION_PLAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "say hi from Zur-lix"}
]
}'
cURL — Anthropic-compatible endpoint
curl -sS https://api.zur-lix.com/v1/messages \
-H "x-api-key: $COMPRESSION_PLAY_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-haiku-4-5",
"max_tokens": 64,
"messages": [
{"role": "user", "content": "say hi from Zur-lix"}
]
}'
Set COMPRESSION_PLAY_API_KEY in
your shell to the key you were handed at invitation. Treat the
key like a password — never paste it into a shared chat,
ticket, log, screenshot, screen recording, or commit. If a key
leaks, contact Zur-lix security at
[email protected]
for an immediate rotation.
Check the Zur-lix dashboard for usage and estimated savings
After making a few requests through Zur-lix from your normal LLM workflow, sign into the dashboard with the same API key to review:
- prompt / completion / total token counts attributed to your account;
- estimated token savings on your own traffic during the evaluation window (corpus-measured, informational only, not guaranteed);
- estimated cost savings derived from the same telemetry;
- routing decisions when multiple models are available.
The dashboard's savings figures are informational estimates computed against the unmodified prompt cost. They are not a promise of a fixed percentage reduction, and they are not customer-billed numbers.