How it works
One env var. Same SDK. Half the bill.
You already pay for an LLM. You probably also pay for the same context to be re-derived every turn — the agent re-reads the same files, re-runs the same memory search, re-tool-calls the same lookups before it can answer. TES intercepts the request, fetches the context the model would have asked for, injects it as a preamble, and forwards the call. The model answers from what's in front of it.
Proxy mode — request flow
Your app
unmodified SDK call
llm.api.pentatonic.com
retrieve(memory, files, prior turns)
→ inject preamble
Upstream model
Anthropic · OpenAI · MiniMax · your own
Your app
identical response shape
Token accounting on a real coding turn
Without TES
With TES (preamble injected)
95.7% fewer tokens, 0 tool calls — same answer.
The env-var swap
Before / after — the only diff
Before
# .env
ANTHROPIC_API_KEY=sk-ant-...import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-sonnet-4",
messages: [
{ role: "user", content: "Refactor the auth middleware" },
],
});After
# .env — one line added, one line changed
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_BASE_URL=https://llm.api.pentatonic.com
TES_API_KEY=tes_<clientId>_<random>import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
defaultHeaders: { "X-TES-Token": process.env.TES_API_KEY },
});
const response = await client.messages.create({
model: "claude-sonnet-4",
messages: [
{ role: "user", content: "Refactor the auth middleware" },
],
});
// same response shape, ~70% fewer input tokens.Same flip works for every supported provider
# Anthropic SDK (TypeScript / Python)
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_BASE_URL=https://llm.api.pentatonic.com
TES_API_KEY=tes_<clientId>_<random>
# Pass TES_API_KEY as the X-TES-Token default header on your client.Anthropic, OpenAI, MiniMax, and any OpenAI-compatible endpoint. Bring your own upstream — TES proxies it.
Three things to know before you ship the diff
Same SDK, same model, same response shape.
TES is a transparent proxy. Your code does not change. Drop in the env var, ship the diff.
Quality is preserved.
We don't swap models, summarise responses, or drop context. We add the right context up-front so the model doesn't waste tokens deriving it.
You can opt out per request.
Send X-TES-Mode: passthrough on any request and we forward to the upstream provider untouched. Useful for A/B'ing the bill.
Who this is for
If you have an LLM bill, this is for you
Solo devs
Burning your personal credit card on Codex, Claude Code, Cursor, or your own agent. Sign up, swap the env var, watch tomorrow's bill drop.
Teams
Developer LLM bills are now a finance line item. Central API key, per-seat dashboard, one invoice. Same compatibility surface across every framework.
Enterprises
Six- or seven-figure annual AI spend. Volume per-token rate, dedicated routing, SLA, audit log. Procurement-friendly contract that scales with your usage.
What we don't do
The shortest list on the site
- We don't swap your model.
- We don't read your responses.
- We don't change response shape.
- We don't make tool calls on your behalf unless you explicitly enable retrieval policies.
If you want to dig deeper into the retrieval policies, the memory layer that powers them, or the request log we keep for your debugging, see the memory layer or the docs.
Get started
Change one environment variable. Watch the bill drop.
Free tier covers your weekend project. Pro is per-token, $20/mo minimum. No credit card needed for the free tier.