How it works

One env var. Same SDK. Half the bill.

You already pay for an LLM. You probably also pay for the same context to be re-derived every turn — the agent re-reads the same files, re-runs the same memory search, re-tool-calls the same lookups before it can answer. TES intercepts the request, fetches the context the model would have asked for, injects it as a preamble, and forwards the call. The model answers from what's in front of it.

Proxy mode — request flow

Your app

unmodified SDK call

llm.api.pentatonic.com

retrieve(memory, files, prior turns)

→ inject preamble

Upstream model

Anthropic · OpenAI · MiniMax · your own

Your app

identical response shape

Token accounting on a real coding turn

Without TES

10,500 tokens · 17 tool calls

With TES (preamble injected)

450

95.7% fewer tokens, 0 tool calls — same answer.

The env-var swap

Before / after — the only diff

Before

.env
# .env
ANTHROPIC_API_KEY=sk-ant-...
agent.ts
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-sonnet-4",
  messages: [
    { role: "user", content: "Refactor the auth middleware" },
  ],
});

After

.env
# .env  — one line added, one line changed
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_BASE_URL=https://llm.api.pentatonic.com
TES_API_KEY=tes_<clientId>_<random>
agent.ts (unchanged)
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  defaultHeaders: { "X-TES-Token": process.env.TES_API_KEY },
});

const response = await client.messages.create({
  model: "claude-sonnet-4",
  messages: [
    { role: "user", content: "Refactor the auth middleware" },
  ],
});

// same response shape, ~70% fewer input tokens.

Same flip works for every supported provider

# Anthropic SDK (TypeScript / Python)
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_BASE_URL=https://llm.api.pentatonic.com
TES_API_KEY=tes_<clientId>_<random>
# Pass TES_API_KEY as the X-TES-Token default header on your client.

Anthropic, OpenAI, MiniMax, and any OpenAI-compatible endpoint. Bring your own upstream — TES proxies it.

Three things to know before you ship the diff

Same SDK, same model, same response shape.

TES is a transparent proxy. Your code does not change. Drop in the env var, ship the diff.

Quality is preserved.

We don't swap models, summarise responses, or drop context. We add the right context up-front so the model doesn't waste tokens deriving it.

You can opt out per request.

Send X-TES-Mode: passthrough on any request and we forward to the upstream provider untouched. Useful for A/B'ing the bill.

Who this is for

If you have an LLM bill, this is for you

Solo devs

Burning your personal credit card on Codex, Claude Code, Cursor, or your own agent. Sign up, swap the env var, watch tomorrow's bill drop.

Teams

Developer LLM bills are now a finance line item. Central API key, per-seat dashboard, one invoice. Same compatibility surface across every framework.

Enterprises

Six- or seven-figure annual AI spend. Volume per-token rate, dedicated routing, SLA, audit log. Procurement-friendly contract that scales with your usage.

What we don't do

The shortest list on the site

  • We don't swap your model.
  • We don't read your responses.
  • We don't change response shape.
  • We don't make tool calls on your behalf unless you explicitly enable retrieval policies.

If you want to dig deeper into the retrieval policies, the memory layer that powers them, or the request log we keep for your debugging, see the memory layer or the docs.

Get started

Change one environment variable. Watch the bill drop.

Free tier covers your weekend project. Pro is per-token, $20/mo minimum. No credit card needed for the free tier.