tokenspy¶
You're spending $800/month on LLMs. Which function is burning it?¶
Find out in one line. No proxy. No signup. No traffic rerouting.
The Problem¶
You get an OpenAI invoice. It says $800 this month. You have no idea which function caused it.
def run_pipeline(query):
docs = fetch_and_summarize(query) # ← costs $600?
entities = extract_entities(docs) # ← or this one?
return generate_report(entities) # ← or this one?
Langfuse and Helicone force you to reroute traffic through their proxy. Sign up. Configure. Break your local setup.
tokenspy takes 1 line. No proxy. No signup. Runs entirely on your machine.
The Fix¶
Output¶
╔══════════════════════════════════════════════════════════════════════╗
║ tokenspy cost report ║
║ total: $0.0523 · 18,734 tokens · 3 calls ║
╠══════════════════════════════════════════════════════════════════════╣
║ ║
║ fetch_and_summarize $0.038 ████████████░░░░ 73% ║
║ └─ gpt-4o $0.038 ████████████░░░░ 73% ║
║ └─ 12,000 tokens ║
║ ║
║ generate_report $0.011 ████░░░░░░░░░░░░ 21% ║
║ └─ gpt-4o $0.011 ████░░░░░░░░░░░░ 21% ║
║ └─ 3,600 tokens ║
║ ║
║ extract_entities $0.003 █░░░░░░░░░░░░░░░ 6% ║
║ └─ gpt-4o-mini $0.003 █░░░░░░░░░░░░░░░ 6% ║
║ └─ 3,134 tokens ║
║ ║
╠══════════════════════════════════════════════════════════════════════╣
║ Optimization hints ║
║ ║
║ 🔴 fetch_and_summarize [gpt-4o] ║
║ Switch to gpt-4o-mini — 94% cheaper (~$540/month savings) ║
║ ║
╚══════════════════════════════════════════════════════════════════════╝
Now you know: fetch_and_summarize is burning 73% of your budget. Fix that one function, cut your bill by $540/month.
Features¶
Flame Graph
Visual cost breakdown by function — instantly see which function is eating your budget.
No Proxy
Pure in-process monkey-patching. No traffic rerouting, no signup, no configuration.
Zero Dependencies
Works with whatever SDK you already have. No extra packages required.
Budget Alerts
Set per-function cost budgets. Warn or raise when exceeded.
Persistence
SQLite storage across sessions. Tag calls with git commits. Query history via CLI.
GitHub Actions
Cost diff per PR. Catch regressions before they merge. Native GHA annotations.
LangChain
Drop-in callback handler for chains, models, and LangGraph agents.
:material-stream: Streaming
Fully supported. Token counts captured after stream completes — zero code changes.
Comparison¶
| Langfuse | Helicone | LiteLLM Proxy | tokenspy | |
|---|---|---|---|---|
| Requires proxy / gateway | ✅ yes | ✅ yes | ✅ yes | ❌ no |
| Requires signup | ✅ yes | ✅ yes | ❌ no | ❌ no |
| Local-first | ❌ no | ❌ no | ⚡ partial | ✅ yes |
| Zero dependencies | ❌ no | ❌ no | ❌ no | ✅ yes |
| Flame graph output | ❌ no | ❌ no | ❌ no | ✅ yes |
@decorator API |
❌ no | ❌ no | ❌ no | ✅ yes |
| Streaming support | ✅ yes | ✅ yes | ✅ yes | ✅ yes |
| Budget alerts | ⚡ partial | ⚡ partial | ❌ no | ✅ yes |
| LangChain integration | ✅ yes | ✅ yes | ✅ yes | ✅ yes |
| CLI history/report | ❌ no | ❌ no | ❌ no | ✅ yes |
| GitHub Actions cost diff | ❌ no | ❌ no | ❌ no | ✅ yes |
| Git commit cost tracking | ❌ no | ❌ no | ❌ no | ✅ yes |
| Optimization hints | ❌ no | ⚡ partial | ❌ no | ✅ yes |
| Works offline | ❌ no | ❌ no | ⚡ partial | ✅ yes |
Next Steps¶
Get tokenspy installed and your first report in 2 minutes.
Full walkthrough with annotated output.
Complete symbol table for every public function and class.
The monkey-patching architecture explained.