Metadata-Version: 2.4
Name: antaris-context
Version: 4.9.16
Summary: Context window optimization for AI agents. Zero dependencies.
Author-email: Antaris Analytics <dev@antarisanalytics.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/Antaris-Analytics/antaris-context
Project-URL: Repository, https://github.com/Antaris-Analytics/antaris-context
Keywords: ai,context,optimization,tokens,agents,llm
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# antaris-context

Context window management and token budget tracking for AI applications. Handles message compression, multiple selection strategies, and budget enforcement with zero external dependencies.

```bash
pip install antaris-context
```

**Version:** 4.2.0  
**Dependencies:** None (stdlib only)  
**Python:** 3.9+

## Core Components

### ContextManager
Primary interface for context window management with token budget tracking.

```python
from antaris_context import ContextManager

# Basic usage
manager = ContextManager(total_budget=8000)
manager.add_turn("user", "How do I implement JWT auth?")
manager.add_turn("assistant", "Use the following approach...")

print(f"Token usage: {manager.get_total_used()}/{manager.total_budget}")
print(f"Over budget: {manager.is_over_budget()}")

# Get detailed usage report
report = manager.get_usage_report()
print(f"Utilization: {report['utilization']:.1%}")
```

### ContextWindow
Lower-level context window with turn-based operations.

```python
from antaris_context import ContextWindow

window = ContextWindow(budget=4000)
window.add_turn("user", "Write a Python function")
window.add_turn("assistant", "def process_data():\n    return data.upper()")

total_used = window.get_total_used()
over_budget = window.is_over_budget()
usage = window.get_usage_report()
```

### MessageCompressor
Compress messages and tool outputs to fit within token limits.

```python
from antaris_context import MessageCompressor

compressor = MessageCompressor(level='moderate')

# Compress message list
messages = [
    {"role": "user", "content": "Long user message..."},
    {"role": "assistant", "content": "Long response..."}
]
compressed = compressor.compress_message_list(messages, max_content_length=500)

# Compress tool output
long_output = "..." * 10000
compressed_output = compressor.compress_tool_output(
    long_output, 
    max_lines=50, 
    keep_first=10, 
    keep_last=10
)

# Get compression statistics
stats = compressor.get_compression_stats()
print(f"Compression ratio: {stats['compression_ratio']:.2f}")
print(f"Bytes saved: {stats['bytes_saved']}")
```

## Selection Strategies

### ContextStrategy (Base)
Abstract base for all selection strategies.

### RecencyStrategy
Prioritize recent content over older content.

```python
from antaris_context import ContextManager

manager = ContextManager(total_budget=8000)
manager.set_strategy('recency', prefer_high_priority=True)

# Add content with priorities
manager.add_content('conversation', old_messages, priority='normal')
manager.add_content('conversation', recent_messages, priority='important')

# Recent content is kept when budget is exceeded
manager.optimize_context()
```

### RelevanceStrategy
Select content based on semantic relevance to a query.

```python
manager.set_strategy('relevance')

# Add content with relevance query
manager.add_content('memory', memory_items, query="authentication JWT")

# Content relevant to the query is prioritized
result = manager.optimize_context(query="JWT authentication Flask")
```

### HybridStrategy
Combine recency and relevance scoring.

```python
manager.set_strategy('hybrid', recency_weight=0.4, relevance_weight=0.6)

manager.add_content('conversation', messages, query="JWT Flask")
result = manager.optimize_context(
    query="JWT authentication", 
    target_utilization=0.85
)
```

### BudgetStrategy
Allocate content based on section budget limits.

```python
manager.set_strategy('budget', approach='balanced')

# Set section budgets
manager.set_section_budgets({
    'system': 1000,
    'memory': 2000,
    'conversation': 4000,
    'tools': 1000
})

manager.optimize_context()
```

## Advanced Compression

### ImportanceWeightedCompressor
Preserve high-importance content during compression.

```python
from antaris_context import ImportanceWeightedCompressor, CompressionResult

compressor = ImportanceWeightedCompressor(
    keep_top_n=5,
    compress_middle=True,
    drop_threshold=0.1
)

# Compress with importance scores
content_items = [
    {"text": "Critical system info", "importance": 0.9},
    {"text": "Regular conversation", "importance": 0.5},
    {"text": "Debug output", "importance": 0.2}
]

result: CompressionResult = compressor.compress(content_items, target_size=2000)
print(f"Items kept: {result.items_kept}")
print(f"Items compressed: {result.items_compressed}")
print(f"Items dropped: {result.items_dropped}")
print(f"Final size: {result.final_size}")
```

### SemanticChunker
Split text at sentence boundaries with configurable overlap.

```python
from antaris_context import SemanticChunker, SemanticChunk

chunker = SemanticChunker(
    min_chunk_size=100,
    max_chunk_size=500,
    overlap_sentences=2
)

chunks: list[SemanticChunk] = chunker.chunk(long_text)
for chunk in chunks:
    print(f"Chunk {chunk.index}: {len(chunk.text)} chars")
    print(f"Sentences: {chunk.sentence_count}")
    print(f"Overlap: {chunk.overlap_size}")
```

## Context Profiling

### ContextProfiler
Analyze token usage patterns and identify optimization opportunities.

```python
from antaris_context import ContextProfiler

profiler = ContextProfiler()
manager = ContextManager(total_budget=8000)

# Track usage over multiple operations
profiler.start_session("conversation_flow")
manager.add_turn("user", "Question 1")
manager.add_turn("assistant", "Answer 1")
profiler.record_usage(manager)

manager.add_turn("user", "Question 2")
manager.optimize_context()
profiler.record_usage(manager)

# Get analysis
analysis = profiler.analyze_patterns()
print(f"Average utilization: {analysis['avg_utilization']:.1%}")
print(f"Peak usage: {analysis['peak_usage']} tokens")

for section, stats in analysis['section_stats'].items():
    print(f"{section}: avg={stats['avg_tokens']}, max={stats['max_tokens']}")

# Get recommendations
recommendations = profiler.get_optimization_recommendations()
for rec in recommendations:
    print(f"- {rec['description']} (impact: {rec['impact']})")
```

## Hard Budget Enforcement *(v4.2.0)*

`render_hard_limited(budget_tokens)` enforces a strict token ceiling. It trims content to fit within the budget and raises `ContextBudgetExceeded` if fitting is impossible (e.g., mandatory system content alone exceeds the budget).

```python
from antaris_context import ContextManager, ContextBudgetExceeded

ctx = ContextManager(budget=1000)
try:
    messages = ctx.render_hard_limited(budget_tokens=500)
except ContextBudgetExceeded as e:
    print(f"Over budget: {e.used} tokens used, {e.budget} budget")
```

Unlike `optimize_context()`, which uses soft limits and may return over-budget results, `render_hard_limited()` guarantees the returned message list never exceeds `budget_tokens`. Use it when you must not exceed a model's context limit.

## Budget Enforcement

Budget enforcement uses soft limits: tracks usage and warns when exceeded, but does not hard-truncate content.

```python
manager = ContextManager(total_budget=8000, strict_budget=False)

# Add content that exceeds budget
manager.add_content('conversation', large_message_history)
manager.add_content('tools', debug_output)

# Check status
if manager.is_over_budget():
    usage = manager.get_usage_report()
    print(f"Over budget by {usage['overage']} tokens")
    print(f"Utilization: {usage['utilization']:.1%}")

# Optimize to fit within budget
result = manager.optimize_context(target_utilization=0.85)
if result.success:
    print(f"Optimized: {result.tokens_freed} tokens freed")
else:
    print(f"Could not reach target: {result.final_utilization:.1%}")
```

## Section Organization

Organize content into logical sections with individual budget allocations.

```python
manager = ContextManager(total_budget=8000)

# Set section budgets
manager.set_section_budgets({
    'system': 1200,    # System prompts, rules
    'memory': 1800,    # Long-term memory items
    'conversation': 4000,  # Chat history
    'tools': 1000      # Tool outputs, debug info
})

# Add content to sections
manager.add_content('system', "You are a helpful assistant.", priority='critical')
manager.add_content('memory', recalled_memories, priority='important')
manager.add_content('conversation', chat_history, priority='normal')
manager.add_content('tools', tool_outputs, priority='optional')

# Check section usage
for section, usage in manager.get_section_usage().items():
    budget = manager.section_budgets[section]
    print(f"{section}: {usage}/{budget} tokens")
```

## Crash-Safe Persistence

### atomic_write_json
Write JSON files atomically to prevent corruption during crashes.

```python
from antaris_context import atomic_write_json
import json

data = {
    'context_state': manager.export_snapshot(),
    'timestamp': time.time(),
    'version': '4.2.0'
}

# Atomic write (temporary file + rename)
atomic_write_json('context_snapshot.json', data)

# Safe even if process crashes during write
try:
    with open('context_snapshot.json', 'r') as f:
        restored_data = json.load(f)
except json.JSONDecodeError:
    print("File was corrupted, but atomic write prevented partial state")
```

## Cross-Session Persistence

Save and restore context state between application sessions.

```python
# Save snapshot with importance filtering
snapshot = manager.export_snapshot(include_importance_above=0.3)

# Restore from snapshot
new_manager = ContextManager.from_snapshot(snapshot)

# Named snapshots for specific states
manager.save_snapshot("pre_optimization")
manager.save_snapshot("post_compression")

# Restore named snapshot
manager.restore_snapshot("pre_optimization")

# List available snapshots
snapshots = manager.list_snapshots()
for name in snapshots:
    print(f"Snapshot: {name}")
```

## Configuration

```python
# JSON configuration
config = {
    "total_budget": 8000,
    "compression_level": "moderate",
    "strategy": "hybrid",
    "strategy_params": {
        "recency_weight": 0.4,
        "relevance_weight": 0.6
    },
    "section_budgets": {
        "system": 1000,
        "memory": 2000,
        "conversation": 4000,
        "tools": 1000
    },
    "auto_optimize": True,
    "target_utilization": 0.85
}

manager = ContextManager.from_config(config)

# Save current configuration
current_config = manager.export_config()
atomic_write_json('context_config.json', current_config)
```

## Complete Example

```python
from antaris_context import (
    ContextManager, MessageCompressor, ContextProfiler, 
    atomic_write_json
)

# Initialize with profiling
profiler = ContextProfiler()
manager = ContextManager(total_budget=8000)
compressor = MessageCompressor('moderate')

# Set strategy and budgets
manager.set_strategy('hybrid', recency_weight=0.3, relevance_weight=0.7)
manager.set_section_budgets({
    'system': 1200,
    'memory': 1800,
    'conversation': 4000,
    'tools': 1000
})

# Add system prompt (critical priority)
manager.add_content('system', 
    "You are a Python coding assistant. Provide working examples.",
    priority='critical'
)

# Add conversation history with compression
chat_history = load_chat_history()
compressed_history = compressor.compress_message_list(
    chat_history, 
    max_content_length=300
)
manager.add_content('conversation', compressed_history, priority='normal')

# Add relevant memories
memories = load_memories()
manager.add_content('memory', memories, 
    query="Python authentication JWT", 
    priority='important'
)

# Process current query
current_query = "How do I add JWT authentication to Flask?"
manager.add_turn("user", current_query)

# Optimize context
profiler.start_session("query_processing")
result = manager.optimize_context(
    query=current_query, 
    target_utilization=0.85
)
profiler.record_usage(manager)

if result.success:
    # Render for LLM
    messages = manager.render_messages(format='openai')
    
    # Process with LLM (not included in this library)
    # response = openai_client.chat.completions.create(...)
    
    # Add response and save state
    manager.add_turn("assistant", "Use Flask-JWT-Extended...")
    
    # Save snapshot
    snapshot = manager.export_snapshot()
    atomic_write_json('session_state.json', snapshot)
    
    print(f"Context optimized: {result.tokens_freed} tokens freed")
else:
    print(f"Optimization incomplete: {result.final_utilization:.1%} utilization")

# Get profiling results
analysis = profiler.analyze_patterns()
print(f"Session efficiency: {analysis['efficiency_score']:.2f}")
```

## Token Estimation

Uses character-based approximation (4 characters per token) for fast budget calculations. For exact counts, plug in your model's tokenizer:

```python
import tiktoken

# Optional: plug in exact tokenizer
enc = tiktoken.encoding_for_model("gpt-4")
manager._estimate_tokens = lambda text: len(enc.encode(text))

# Default approximation is sufficient for budget management
estimated = manager._estimate_tokens("Hello world")  # ~3 tokens
actual = len(enc.encode("Hello world"))  # 2 tokens (exact)
```

## Performance Characteristics

- **Token estimation:** ~100,000 characters/second
- **Message compression:** ~50,000 characters/second  
- **Strategy selection:** ~10,000 messages/second
- **Context optimization:** ~1,000 content items/second
- **Memory usage:** Linear with content size
- **CPU usage:** O(n log n) for relevance ranking, O(n) for other operations

## Limitations

- **Token estimation is approximate:** Use actual tokenizer for exact counts
- **No LLM calls:** Compression is structural, not semantic (unless using pluggable summarizer)
- **Single-threaded:** Not designed for concurrent access
- **Memory bound:** All content held in memory during processing
- **No distributed contexts:** Manages single context windows only

## Testing

```bash
git clone https://github.com/Antaris-Analytics-LLC/antaris-suite.git
cd antaris-context
python -m pytest tests/ -v --cov=antaris_context
```

150 tests, 95% coverage, zero external dependencies.

## Integration with Antaris Suite

```python
# With antaris-memory
from antaris_memory import MemoryClient
memory_client = MemoryClient()
manager.set_memory_client(memory_client)

# With antaris-router  
from antaris_router import Router
router = Router()
hints = router.get_routing_hints(query)
manager.set_router_hints(hints)

# With antaris-guard
from antaris_guard import ContentFilter
filter = ContentFilter()
safe_content = filter.scan(content)
manager.add_content('conversation', safe_content)
```

## License

Apache 2.0 License with explicit patent grant clause.
