The optimization layer for Claude Code

Cut your Claude Code bill by 34–51% on average
without changing your workflow.

Fermat's Last Token runs alongside Claude Code to reduce token spend, compress context, and optimize tool calls while preserving output quality, without changing how you work.

claude code · /usage
zsh
// before — vanilla claude code
Session · TypeScript bundle
  Agent cost:     $60.62
  Cost / ticket:  $12.12
  Agent turns:    21
  Quality:        0.849
// after — with fermat
Session · TypeScript bundle
  Agent cost:     $29.79   ▾51%
  Cost / ticket:  $5.96    ▾51%
  Agent turns:    15       ▾30%
  Quality:        0.931    ▴+0.08
saved per session$30.83

Measured on the SessionBench TypeScript bundle — 5 SWE-Atlas refactoring tickets (grafana) run as one resumed session, with claude-opus-4-8, averaged over 5 paired sessions.

Built by researchers and developers from
Stanford, Amazon Web Services, Apple, Accenture
§ 02The method

Three small theorems that, together, compress your bill.

Theorem1.

Semantic context pruning

Older context is safely optimized while recent instructions, tool relationships, and task-critical state stay intact. Seamlessly integrates with Anthropic's prompt caching.

Theorem2.

Optimized agent tools

Purpose-built tools that do more per call, reduce repeated reads, and keep edits tied to the files the agent actually inspected.

Theorem3.

Quality-preserving compression

Tool outputs and prior assistant prose are compressed before cache writes, with fail-soft passthrough when compression is unsafe.

§ 03Measured, not promised

47% cheaper on average.
Zero quality lost.

SessionBench runs the same multi-ticket sessions twice — once on vanilla Claude Code, once through Fermat — on real refactoring tasks from Scale AI's SWE-Atlas. Across 15 paired sessions in three repos — 150 ticket runs in total — Fermat was cheaper in every run while quality held or even improved.

Benchmark
Avg. Vanilla
Avg. Fermat
Saved
Avg. Quality
grafana · Go
$33.08
$21.68
34%
0.82 → 0.89
grafana · TypeScript
$60.62
$29.79
51%
0.85 → 0.93
scapy · Python
$61.31
$30.38
50%
0.82 → 0.92
Overall · 15 sessions
$51.67
$27.29
47%
0.83 → 0.92

Per-session agent token cost, averaged over K=5 runs per benchmark · claude-opus-4-8 · built on SWE-Atlas.

Full methodology, raw logs & reproduction repo
§ 04Install

One line.
Zero config.

Fermat runs Claude Code through a local wrapper and lightweight proxy, with built-in optimized tools and high-fidelity compression. Nothing changes about how you prompt, use tools, or review diffs — except fewer wasted tokens!

  • + Works with your existing Claude Code setup.
  • + Choose between metered billing and flat-rate (unlimited access).
  • + Org-level support with centralized access control and billing.
  • + Compatible with all Opus, Sonnet & Haiku models.
terminal
$ curl -fsSL https://downloads.quotientlabs.com/fermat/install.sh | bash
↳ then run Claude Code CLI as normal
$20/mo
Unlimited usage, unlimited savings.
5% metered
Pay just 5% of what you saved. No matter what, your wallet comes out on top.
§ 05Marginalia

Footnotes.

i.Does this actually keep quality the same?+

Yes — and we measure it. Our SessionBench suite runs multi-ticket sessions on both Fermat and vanilla Claude Code, grading every ticket with its own test suite plus an LLM rubric. Across 15 paired sessions on real SWE-Atlas refactoring tasks (Go, TypeScript and Python), quality held or improved — Fermat matched or beat vanilla on quality in 13 of 15 runs (and was never far off in the remaining two), with Fermat improving the overall average quality score from 0.83 to 0.92. Full methodology and raw logs are in the benchmark repo.

ii.What does Claude Code itself see?+

The same protocol as before. Fermat sits between your CLI and the Anthropic API, rewriting outbound and inbound context. From Claude's perspective it's an ordinary request, just with leaner, denser context.

iii.Will this work with my existing Claude Code setup?+

Yes. Fermat is designed to integrate seamlessly into your existing Claude Code workflow, including your normal shell, projects, tools, custom skills, and API usage — all you need to do is run Claude Code from your terminal as normal. You can also use vanilla Claude Code at any time by running claude --vanilla.

iv.How can I see my savings?+

Manage your account, add members to your organization, and track your savings at console.quotientlabs.com.

v.What if I'm on a subscription plan?+

Fermat works the same whether your Anthropic subscription is monthly (Max 5x/20x) or API-based. If you're on API billing, Fermat directly saves you dollars; if you're paying a flat rate, you'll see much higher usage limits.

vi.Why 'Fermat's Last Token'?+

Because the proof of how much context Claude actually needs is small enough to fit in the margin. Also, we are unserious people doing serious work.

Q.E.D.

Stop paying for tokens Claude never reads.

Install Fermat