Cut Your AI Costs 50-80%
Stop loading entire files into your prompts. TokenShrink gives your AI only what it needs, compressed.
pip install tokenshrink[compression]
50-80%
Token Reduction
<50ms
Search Latency
~200ms
Compression Time
โ Without TokenShrink
- Load entire file (5000 tokens)
- $0.15 per query
- Slow responses
- Hit context limits
- Irrelevant info confuses the model
โ With TokenShrink
- Load relevant chunks (200 tokens)
- $0.03 per query
- Fast responses
- Stay under limits
- Focused, accurate answers
Simple. Fast. Effective.
Semantic Search
FAISS + MiniLM embeddings find exactly what's relevant to your query.
Smart Compression
LLMLingua-2 removes redundancy while preserving meaning.
Drop-in Ready
Works with OpenAI, Anthropic, LangChain, AutoGen โ any LLM stack.
Apple Silicon
Native MPS acceleration on Mac. CUDA on NVIDIA.
Auto-Indexing
Detects changed files automatically. No manual maintenance.
Open Source
MIT licensed. Self-host anywhere. No vendor lock-in.
from tokenshrink import TokenShrink # Initialize and index your docs ts = TokenShrink() ts.index("./docs") # Get compressed context for your LLM result = ts.query("What are the API rate limits?") print(result.context) # Ready for your prompt print(result.savings) # "Saved 65% (1200 โ 420 tokens)"