Cut Your AI Costs 50-80%

Stop loading entire files into your prompts. TokenShrink gives your AI only what it needs, compressed.

pip install tokenshrink[compression]
50-80%
Token Reduction
<50ms
Search Latency
~200ms
Compression Time

โŒ Without TokenShrink

  • Load entire file (5000 tokens)
  • $0.15 per query
  • Slow responses
  • Hit context limits
  • Irrelevant info confuses the model

โœ“ With TokenShrink

  • Load relevant chunks (200 tokens)
  • $0.03 per query
  • Fast responses
  • Stay under limits
  • Focused, accurate answers

Simple. Fast. Effective.

๐Ÿ”

Semantic Search

FAISS + MiniLM embeddings find exactly what's relevant to your query.

๐Ÿ—œ๏ธ

Smart Compression

LLMLingua-2 removes redundancy while preserving meaning.

โšก

Drop-in Ready

Works with OpenAI, Anthropic, LangChain, AutoGen โ€” any LLM stack.

๐ŸŽ

Apple Silicon

Native MPS acceleration on Mac. CUDA on NVIDIA.

๐Ÿ“

Auto-Indexing

Detects changed files automatically. No manual maintenance.

๐Ÿ”“

Open Source

MIT licensed. Self-host anywhere. No vendor lock-in.

from tokenshrink import TokenShrink

# Initialize and index your docs
ts = TokenShrink()
ts.index("./docs")

# Get compressed context for your LLM
result = ts.query("What are the API rate limits?")

print(result.context)   # Ready for your prompt
print(result.savings)   # "Saved 65% (1200 โ†’ 420 tokens)"