Quickstart¶
This walkthrough shows tokenspy on a realistic multi-function pipeline in under 3 minutes.
The scenario¶
You have a pipeline with three functions that each make LLM calls. You want to know which one costs the most.
import openai
import tokenspy
client = openai.OpenAI()
# (1) Decorate each function you want to profile
@tokenspy.profile
def fetch_and_summarize(query: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Summarize recent news about: {query}"}]
)
return response.choices[0].message.content
@tokenspy.profile
def extract_entities(text: str) -> list[str]:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"List named entities in: {text}"}]
)
return response.choices[0].message.content.split("\n")
@tokenspy.profile
def generate_report(entities: list[str]) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Write a report on: {entities}"}]
)
return response.choices[0].message.content
# (2) Run your pipeline normally
docs = fetch_and_summarize("Q3 earnings")
entities = extract_entities(docs)
report = generate_report(entities)
# (3) See the cost breakdown
tokenspy.report()
Output — terminal flame graph¶
╔══════════════════════════════════════════════════════════════════════╗
║ tokenspy cost report ║
║ total: $0.0523 · 18,734 tokens · 3 calls ║
╠══════════════════════════════════════════════════════════════════════╣
║ ║
║ fetch_and_summarize $0.038 ████████████░░░░ 73% ║
║ └─ gpt-4o $0.038 ████████████░░░░ 73% ║
║ └─ 12,000 tokens ║
║ ║
║ generate_report $0.011 ████░░░░░░░░░░░░ 21% ║
║ └─ gpt-4o $0.011 ████░░░░░░░░░░░░ 21% ║
║ └─ 3,600 tokens ║
║ ║
║ extract_entities $0.003 █░░░░░░░░░░░░░░░ 6% ║
║ └─ gpt-4o-mini $0.003 █░░░░░░░░░░░░░░░ 6% ║
║ └─ 3,134 tokens ║
║ ║
╠══════════════════════════════════════════════════════════════════════╣
║ Optimization hints ║
║ ║
║ 🔴 fetch_and_summarize [gpt-4o] ║
║ Switch to gpt-4o-mini — 94% cheaper (~$540/month savings) ║
║ ║
║ 🟡 fetch_and_summarize [gpt-4o] ║
║ Avg input: 12,000 tokens. Trim context or limit retrieval. ║
╚══════════════════════════════════════════════════════════════════════╝
Result: fetch_and_summarize burns 73% of the budget. Switching it to gpt-4o-mini saves ~$540/month.
HTML report¶
For a richer visual:
Programmatic access¶
Access the raw numbers from your code:
data = tokenspy.stats()
print(data["total_cost_usd"]) # 0.0523
print(data["by_function"]) # {"fetch_and_summarize": 0.038, ...}
print(data["by_model"]) # {"gpt-4o": 0.049, "gpt-4o-mini": 0.003}
What's next?¶
| Goal | Guide |
|---|---|
| Profile an async function | Decorator → |
| Profile a block of code | Context Manager → |
| Set a cost budget | Budget Alerts → |
| Save history across runs | Persistence → |
| Profile LangChain chains | LangChain → |
| Catch cost regressions in CI | GitHub Actions → |