Benchmarks
Numbers, not adjectives.
We publish the median input-token reduction, the added proxy latency, and the per-workload split. The methodology, the raw request samples, and the upstream comparison are open — point us at a workload and we'll add it to the table.
Loading benchmark data…
Methodology
How we measure
- Input-token reduction — measured against the same prompts run direct to the upstream provider, with the same model and the same response constraints. We compare upstream-reported usage on both legs.
- Latency overhead — p95 added wall-clock time between request leaving the client and response arriving back, attributable to the TES retrieval and preamble step. Excludes upstream provider time.
- Quality preservation — we don't claim quality wins. We claim quality preservation: same upstream model, no response rewrites. Per-workload eval scripts available on request.
- Workload mix — coding-assistant traffic, RAG-heavy retrieval, agentic multi-tool turns, conversational chat. Each is reported separately because the savings profile differs.
Run it on your workload
Want a benchmark on your own traffic?
Spin up a free account, run a day of mirrored traffic through TES, and compare your invoices. We'll help you set up the comparison.