Semantic context pruning
Older context is safely optimized while recent instructions, tool relationships, and task-critical state stay intact. Seamlessly integrates with Anthropic's prompt caching.
Fermat's Last Token runs alongside Claude Code to reduce token spend, compress context, and optimize tool calls while preserving output quality, without changing how you work.
Session · TypeScript bundle Agent cost: $60.62 Cost / ticket: $12.12 Agent turns: 21 Quality: 0.849
Session · TypeScript bundle Agent cost: $29.79 ▾51% Cost / ticket: $5.96 ▾51% Agent turns: 15 ▾30% Quality: 0.931 ▴+0.08
Measured on the SessionBench TypeScript bundle — 5 SWE-Atlas refactoring tickets (grafana) run as one resumed session, with claude-opus-4-8, averaged over 5 paired sessions.
Older context is safely optimized while recent instructions, tool relationships, and task-critical state stay intact. Seamlessly integrates with Anthropic's prompt caching.
Purpose-built tools that do more per call, reduce repeated reads, and keep edits tied to the files the agent actually inspected.
Tool outputs and prior assistant prose are compressed before cache writes, with fail-soft passthrough when compression is unsafe.
SessionBench runs the same multi-ticket sessions twice — once on vanilla Claude Code, once through Fermat — on real refactoring tasks from Scale AI's SWE-Atlas. Across 15 paired sessions in three repos — 150 ticket runs in total — Fermat was cheaper in every run while quality held or even improved.
Per-session agent token cost, averaged over K=5 runs per benchmark · claude-opus-4-8 · built on SWE-Atlas.
Full methodology, raw logs & reproduction repo→Fermat runs Claude Code through a local wrapper and lightweight proxy, with built-in optimized tools and high-fidelity compression. Nothing changes about how you prompt, use tools, or review diffs — except fewer wasted tokens!
$ curl -fsSL https://downloads.quotientlabs.com/fermat/install.sh | bashYes — and we measure it. Our SessionBench suite runs multi-ticket sessions on both Fermat and vanilla Claude Code, grading every ticket with its own test suite plus an LLM rubric. Across 15 paired sessions on real SWE-Atlas refactoring tasks (Go, TypeScript and Python), quality held or improved — Fermat matched or beat vanilla on quality in 13 of 15 runs (and was never far off in the remaining two), with Fermat improving the overall average quality score from 0.83 to 0.92. Full methodology and raw logs are in the benchmark repo.
The same protocol as before. Fermat sits between your CLI and the Anthropic API, rewriting outbound and inbound context. From Claude's perspective it's an ordinary request, just with leaner, denser context.
Yes. Fermat is designed to integrate seamlessly into your existing Claude Code workflow, including your normal shell, projects, tools, custom skills, and API usage — all you need to do is run Claude Code from your terminal as normal. You can also use vanilla Claude Code at any time by running claude --vanilla.
Manage your account, add members to your organization, and track your savings at console.quotientlabs.com.
Fermat works the same whether your Anthropic subscription is monthly (Max 5x/20x) or API-based. If you're on API billing, Fermat directly saves you dollars; if you're paying a flat rate, you'll see much higher usage limits.
Because the proof of how much context Claude actually needs is small enough to fit in the margin. Also, we are unserious people doing serious work.