Zur-lix^™

Evaluator Access Flow

Zur-lix is AI context optimization infrastructure — Stripe for AI token optimization. This page walks a developer-friend or evaluator from "I have a key" to "I see real savings on the dashboard," safely, in four short steps.

Zur-lix is in controlled rollout preparation, not a public launch. Savings shown are informational estimates (corpus-measured, not guaranteed). Billing is not active and no charge is authorized. Access is by invited evaluator key only.

What you need

An evaluator API key shared with you privately (looks like cp_live_…). Treat it as a secret.
A terminal with curl and an environment you can export a variable into.
A modern browser to log into the dashboard.
Five minutes. Nothing else needs to be installed.

Production base URL: https://api.zur-lix.com
Dashboard: /dashboard
Health probe (no auth): /health

1Set your key locally

In a terminal, export the key into the COMPRESSION_PLAY_API_KEY environment variable. Replace the placeholder below with the value you were given. Do not echo or print the key after this.

export COMPRESSION_PLAY_API_KEY="cp_live_..."

The same key is used for API requests and for dashboard login. You will never need to paste it into a URL, a query string, a screenshot, or a chat message.

2Send one safe test request

Post a small, harmless prompt to the OpenAI-compatible endpoint. The bearer token is read from the environment variable so it never appears literally on the command line.

curl -sS https://api.zur-lix.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $COMPRESSION_PLAY_API_KEY" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Reply with one sentence confirming this Zur-lix evaluator request reached the API."
      }
    ]
  }'

You should get back a normal OpenAI Chat Completions JSON response with a one-sentence confirmation in choices[0].message.content. Anything else (network error, 401, 5xx) is worth reporting in your feedback — see the last section.

Primary auth header: X-API-Key: $COMPRESSION_PLAY_API_KEY — this is the primary credential header. The Authorization: Bearer form shown in the curl above is an accepted alternative (it keeps the key off the command line by reading it from the environment). Send one of them, not both.
No X-Customer-Id: You never send an X-Customer-Id header. Your customer identity is derived from the key itself, so the customer id is not an authentication input.

For the Anthropic-compatible surface, replace /v1/chat/completions with /v1/messages and use the Anthropic request shape — same auth header, same base URL. Its auth contract is validated: an invalid or missing key returns 401 before any provider call, just like the OpenAI surface. Live customer/provider use of either endpoint stays gated by the current no-customer-execution posture and needs a separate owner approval; nothing on this page authorizes it.

3Open the dashboard

Open /dashboard in your browser. Paste the same COMPRESSION_PLAY_API_KEY value into the login form. The dashboard exchanges it for a short-lived session token internally; you never see another credential prompt.

The dashboard shows a few panels:

Headline KPIs — total requests, tokens, savings rate, latency, cost.
Daily Token Savings + Requests by Model — charts. The model doughnut uses stable per-provider colors (OpenAI blue, Anthropic purple/orange, Other gray).
Direct Savings Sources — per-layer decomposition of tokens_saved. These rows are not additive on top of the headline figure; they explain what produced it.
Routing Decisions — pin / swap counts, top model pairs, recent swaps.
Provider Cache Savings + cache timeseries + cost accuracy — the cache lane.

4What success looks like

Signs the system is working for you

The curl request returns 200 with a normal chat completion.
The dashboard accepts your key and lands on the Overview.
Your request count and tokens-processed KPIs are non-zero on a recent window (24h / 7d / 30d).
Other panels load without errors. Some may legitimately show "No data yet" on a brand-new account — that is normal until enough traffic accumulates.

One small request will not produce visible cache or ranking-drop savings. Cache and stable-context reuse take multiple requests with overlapping system prompts or RAG prefixes. Direct compression savings depend on the prompt content. Don't worry if your first request shows zero in the per-layer panel — that is the expected baseline.

Choose Your Response Mode

Zur-lix lets every caller choose how much detail to receive alongside the normal provider response. The choice is per-request: a header you set on a single call. Compression behaviour, the upstream provider call, and the metrics in the dashboard are identical in every mode — only the customer-facing compression block changes shape. None of the modes below are realised-savings claims; the dashboard remains the authoritative proof surface.

A. Standard response (default)

No special header required. Returns the existing full backward-compatible response, including the verbose compression.analysis technical breakdown. Best when you are integrating and want maximum backward compatibility with the legacy verbose response format.

B. Autopilot minimal response (request-scoped)

Set the header X-Zurlix-Autopilot-Minimal: true on a single request. The route returns a smaller receipt-style compression block that contains only the fields normal Autopilot users actually need:

request_id
compression_mode
profile
savings
latency

The minimal-mode response intentionally does not include:

analysis — the full engine report.
input — the original prompt text.
output — the post-engine text.
demo — the sales-demo rehash.

Best when you want Zur-lix to optimize without showing the full technical breakdown to your end users. The savings object is a receipt field, not a realised-savings claim — it is the same value already exposed today by the standard response.

X-Zurlix-Autopilot-Minimal: true is request-scoped: it does not globally change other traffic, and the broad global rollout is not required for a caller to try the minimal response. Other callers on the same account see the standard response unless they too send the header. The normal provider response fields like choices and usage remain present unchanged.

Try minimal Autopilot — curl

curl -sS https://api.zur-lix.com/v1/chat/completions \
  -H "Authorization: Bearer $COMPRESSION_PLAY_API_KEY" \
  -H "X-Zurlix-Autopilot-Minimal: true" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Reply with exactly: autopilot mode ok"
      }
    ],
    "max_tokens": 20
  }'

Expected receipt-style shape

{
  "compression": {
    "request_id": "...",
    "compression_mode": "...",
    "profile": "...",
    "savings": { },
    "latency": { }
  }
}

C. Explain response (request-scoped)

Set the header X-Zurlix-Explain: true on a single request. The route returns the full technical breakdown, including compression.analysis and the detailed compression_summary. Best for debugging, enterprise review, evaluator review, and trust-building when you want the engine to show its work.

Try Explain — curl

curl -sS https://api.zur-lix.com/v1/chat/completions \
  -H "Authorization: Bearer $COMPRESSION_PLAY_API_KEY" \
  -H "X-Zurlix-Explain: true" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Reply with exactly: explain mode ok"
      }
    ],
    "max_tokens": 20
  }'

The Explain response includes compression.analysis and may include the detailed compression_summary. This payload is intended for debugging, audits, and technical review — the same field set evaluators have always seen.

Header precedence

If both headers are present, X-Zurlix-Explain: true wins and returns the full technical response. The minimal Autopilot header is silently ignored in that case. This precedence is the same in every deployment regardless of any operator-side env flag.

D. Analyzer preflight

When you want context intelligence before sending a large handoff prompt to a model, hit POST /v1/analyze/context. The analyzer is analysis-only: no provider call, no completion, no mutation. See the “Context Intelligence Analyzer” section below for the full contract, response shape, and a curl example.

E. Dashboard proof

Use /dashboard for aggregate proof and cost / observed-metrics visibility across all your traffic. The dashboard is the place to look at receipt fields summed over time — the per-request receipt blocks above are not realised-savings claims on their own.

Context Intelligence Analyzer

Use this when you want to inspect a large handoff prompt before paying to send it to a model. The analyzer reads your messages, classifies the context (carryover handoff, repeated state, task-relevance buckets), and returns a structured analysis_summary — without ever calling OpenAI, Anthropic, or any other provider.

Endpoint: POST /v1/analyze/context
Auth: Same Authorization: Bearer $COMPRESSION_PLAY_API_KEY header as /v1/chat/completions
Mode: analysis_only — deterministic, analysis-only telemetry

What it returns

Top-level keys in the response body:

mode — the constant "analysis_only" so consumers can branch deterministically.
analysis_summary — a compact headline answer: result, confidence, primary_signal, plus three boolean roll-ups (carryover_detected, repeated_state_detected, task_budget_candidates_detected) and a short recommendation string.
carryover_handoff — whether the prompt looks like a project-state / agent-continuation handoff bundle, plus the per-signal counts that drove the decision.
context_delta — how much of the prompt looks like repeated state vs. new delta. Reports approximate counts, never realised savings.
task_relevance — per-block role classification (must-keep / constraint / validation / active context / history / background) + closed-enum task_family and risk_level + candidate budget-trim block count.
caveats — three short safety statements pinning the analysis-only contract.

What it does NOT do

It does not call OpenAI.
It does not call Anthropic.
It does not generate a completion.
It does not mutate your prompt — no content was changed.
It does not persist analyzer results.
It does not claim realised savings. Approximate token counts are heuristic estimates.

Try it — curl

curl -sS https://api.zur-lix.com/v1/analyze/context \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $COMPRESSION_PLAY_API_KEY" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "You are continuing a project handoff.\n\nDo not commit.\nDo not push.\n\nRequired work:\n1. Confirm baseline.\n2. Analyze context before sending."
      }
    ]
  }'

Expected response shape

Top-level fields you can check at a glance (the full response also contains the carryover_handoff, context_delta, and task_relevance detail blocks plus the caveats array):

{
  "mode": "analysis_only",
  "analysis_summary": {
    "result": "context_budget_opportunity_detected",
    "confidence": "high",
    "primary_signal": "combined"
  }
}

Analyzer success looks like

HTTP 200.
mode equals "analysis_only".
analysis_summary.result is present and is one of: no_context_budget_opportunity_detected, carryover_detected, repeated_state_detected, task_budget_opportunity_detected, or context_budget_opportunity_detected.
analysis_summary.confidence is one of low, medium, high.
No provider call was made.
No completion was generated.
No content was changed.

no_context_budget_opportunity_detected is also a perfectly valid result — short or simple prompts return that, and it is exactly what the analyzer should say. The analyzer is a measurement surface, not a compressor.

Troubleshooting

401 · invalid_api_key: The key was missing, mistyped, or sent in a malformed header. The OpenAI surface returns {"error":{"code":"invalid_api_key", …}}; the Anthropic surface returns the equivalent authentication_error envelope. Check that $COMPRESSION_PLAY_API_KEY is set, that you send exactly one auth header (X-API-Key or Authorization: Bearer), and that no stray quotes or whitespace crept into the value.
Cloudflare error 1010: The edge's browser-integrity / client-shape check rejected the request. It is triggered by browser-impersonating clients (headless browsers driven by automation tools, scrapers spoofing a Chrome user-agent) — not by ordinary browser tabs and not by plain API clients. Two clean options that do not hit this check: (1) the official Browser Launchpad on zur-lix.com, which is CORS-allowlisted and sends a real browser request from a real user gesture; (2) a plain curl request like the one above, which is the reference client shape for terminal and server-side callers.
Scripting the call: For an interactive evaluation, the Browser Launchpad on zur-lix.com is the fastest path — paste your key, pick a model, send a prompt. For a scripted or programmatic flow, use a maintained HTTP client (curl, requests, or httpx) and set the auth header explicitly. Avoid hand-rolling urllib requests — it is easy to misformat the auth header or send an unusual client shape that the edge rejects.

What not to do

Please avoid

Pasting your API key into a screenshot, Slack message, GitHub issue, shared doc, support ticket, or a screen recording. Treat the key like a password.
Putting the key in a URL query string. Always use the Authorization: Bearer header.
Running synthetic load generators, paid eval harnesses, or scripted flooding against the production endpoint. Evaluation traffic should be organic and small.
Sharing the key with anyone outside the people invited to evaluate. Each evaluator should use their own key.
Treating the page-1 quickstart as a benchmark. Zur-lix's savings curve is a function of repeated, context-rich traffic, not single one-off requests.

How to report feedback

When something looks off — or just to confirm the system worked — share a short note with these fields. None of them require sharing private content.

The endpoint you called (e.g. /v1/chat/completions or /v1/messages).
The model you requested (e.g. gpt-4o-mini).
Approximate time of your request, in your local timezone.
Whether dashboard login worked the first time.
Which dashboard panels loaded vs which showed "No data yet" or an error.
Anything that felt confusing or unclear — copy welcome.

Do not include the API key, raw prompt content, or raw response bodies unless we explicitly ask and you've redacted anything sensitive. The team can correlate from the timestamp + endpoint + model alone.

Privacy

Zur-lix does not persist raw prompt or response text into telemetry. Internal capture rows store integer counts and a one-way HMAC-based stable-context fingerprint that uses a server-side pepper — the fingerprint cannot be reversed back into prompt content. Operator diagnostics expose only aggregate integer counters; no per-tenant identifiers, no content.

Beta evaluator details

Zur-lix private beta. Controlled unpaid beta. Invitation only. The first wave is a 30-day window starting on the date your invitation is issued.

Cost: free during this evaluation window. Payment collection is not available during the controlled unpaid beta; no checkout flow is exposed; no invoice is generated. Savings shown in the dashboard are informational, corpus-measured estimates and are not guaranteed.
Support and feedback: beta support support@zur-lix.com; security questions security@zur-lix.com; new beta access requests beta@zur-lix.com. Response target during weekdays in the 30-day window: 24 hours.
Per-key caps for this wave: 30 requests per minute, 20,000 tokens per minute, monthly cost ceiling USD 5 per tester. The operator may pause or end the evaluation at any time.
Not authorized in this wave: paid beta, public self-serve, global rollout, live billing, customer charges, partnership claims, or any compliance certification claim against the SOC-2 / GDPR / UK GDPR / EU AI Act / ISO 42001 / PCI / PCI DSS / HIPAA frameworks.

Use Zur-lix from your existing LLM workflow

Connect your existing LLM client to Zur-lix using your API key and the endpoint base URL. Same request shape, same response shape; only the base URL changes.

Python — OpenAI-compatible client

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["COMPRESSION_PLAY_API_KEY"],
    base_url="https://api.zur-lix.com/v1",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "say hi from Zur-lix"},
    ],
)
print(response.choices[0].message.content)

JavaScript — OpenAI-compatible client

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.COMPRESSION_PLAY_API_KEY,
  baseURL: "https://api.zur-lix.com/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    { role: "user", content: "say hi from Zur-lix" },
  ],
});
console.log(response.choices[0].message.content);

cURL — OpenAI-compatible endpoint

curl -sS https://api.zur-lix.com/v1/chat/completions \
  -H "Authorization: Bearer $COMPRESSION_PLAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "say hi from Zur-lix"}
    ]
  }'

cURL — Anthropic-compatible endpoint

curl -sS https://api.zur-lix.com/v1/messages \
  -H "x-api-key: $COMPRESSION_PLAY_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5",
    "max_tokens": 64,
    "messages": [
      {"role": "user", "content": "say hi from Zur-lix"}
    ]
  }'

Set COMPRESSION_PLAY_API_KEY in your shell to the key you were handed at invitation. Treat the key like a password — never paste it into a shared chat, ticket, log, screenshot, screen recording, or commit. If a key leaks, contact Zur-lix security at security@zur-lix.com for an immediate rotation.

Check the Zur-lix dashboard for usage and estimated savings

After making a few requests through Zur-lix from your normal LLM workflow, sign into the dashboard with the same API key to review:

prompt / completion / total token counts attributed to your account;
estimated token savings on your own traffic during the evaluation window (corpus-measured, informational only, not guaranteed);
estimated cost savings derived from the same telemetry;
routing decisions when multiple models are available.

The dashboard's savings figures are informational estimates computed against the unmodified prompt cost. They are not a promise of a fixed percentage reduction, and they are not customer-billed numbers.