Benchmarks

Numbers, not adjectives.

We publish the median input-token reduction, the added proxy latency, and the per-workload split. The methodology, the raw request samples, and the upstream comparison are open — point us at a workload and we'll add it to the table.

Loading benchmark data…

Methodology

How we measure

  • Input-token reduction — measured against the same prompts run direct to the upstream provider, with the same model and the same response constraints. We compare upstream-reported usage on both legs.
  • Latency overhead — p95 added wall-clock time between request leaving the client and response arriving back, attributable to the TES retrieval and preamble step. Excludes upstream provider time.
  • Quality preservation — we don't claim quality wins. We claim quality preservation: same upstream model, no response rewrites. Per-workload eval scripts available on request.
  • Workload mix — coding-assistant traffic, RAG-heavy retrieval, agentic multi-tool turns, conversational chat. Each is reported separately because the savings profile differs.

Run it on your workload

Want a benchmark on your own traffic?

Spin up a free account, run a day of mirrored traffic through TES, and compare your invoices. We'll help you set up the comparison.