Metadata-Version: 2.4
Name: antaris-context
Version: 5.0.1
Summary: Context window optimization for AI agents. Zero dependencies.
Author-email: Antaris Analytics <dev@antarisanalytics.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/Antaris-Analytics/antaris-context
Project-URL: Repository, https://github.com/Antaris-Analytics/antaris-context
Keywords: ai,context,optimization,tokens,agents,llm
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# antaris-context

> **Context window optimization for AI agents — zero dependencies, production-ready.**

[![PyPI version](https://img.shields.io/pypi/v/antaris-context)](https://pypi.org/project/antaris-context/)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

`antaris-context` is the context window management layer of the [antaris-suite](https://github.com/Antaris-Analytics-LLC/antaris-suite). It gives AI agents precise control over what fits into an LLM's context window — handling compression, prioritization, turn lifecycle, adaptive budgets, and cross-session sharing, all with zero external dependencies.

---

## 📋 Table of Contents

1. [Installation](#-installation)
2. [Quick Start](#-quick-start)
3. [Core Concepts](#-core-concepts)
4. [ContextManager](#-contextmanager)
5. [Section Templates](#-section-templates)
6. [Content Management](#-content-management)
7. [Selection Strategies](#-selection-strategies)
8. [Compression Levels](#-compression-levels)
9. [Turn Lifecycle Management](#-turn-lifecycle-management)
10. [Rendering for APIs](#-rendering-for-apis)
11. [optimize_context()](#-optimize_context)
12. [Adaptive Budgets](#-adaptive-budgets)
13. [Cascade Overflow](#-cascade-overflow)
14. [Cross-Session Context Sharing](#-cross-session-context-sharing)
15. [In-Process Snapshots](#-in-process-snapshots)
16. [Config Persistence](#-config-persistence)
17. [Reports & Analysis](#-reports--analysis)
18. [Truncation Strategies](#-truncation-strategies)
19. [Importance-Weighted Compression](#-importance-weighted-compression)
20. [Integration: antaris-memory](#-integration-antaris-memory)
21. [Integration: antaris-router](#-integration-antaris-router)
22. [All Exports](#-all-exports)
23. [Exception Reference](#-exception-reference)
24. [Full Example: Agent with Tools](#-full-example-agent-with-tools)

---

## 📦 Installation

```bash
pip install antaris-context
```

**Version:** `4.9.20`  
**Requirements:** Python 3.8+ · Zero external dependencies · stdlib only

---

## 🚀 Quick Start

```python
from antaris_context import ContextManager

# Create a context manager with an 8,000 token budget
cm = ContextManager(total_budget=8000, template="agent_with_tools")

# Add content to sections
cm.add_content("system", "You are a helpful assistant.")
cm.add_content("memory", "User prefers concise answers.", priority="important")
cm.add_content("conversation", [
    {"role": "user", "content": "What's the capital of France?"},
    {"role": "assistant", "content": "Paris."},
])

# Add a new turn
cm.add_turn(role="user", content="And Germany?")

# Render for OpenAI
messages = cm.render(provider="openai", system_prompt="You are a helpful assistant.")

# Check budget
print(cm.get_usage_report())
```

---

## 🧠 Core Concepts

`antaris-context` models a context window as a set of **named sections**, each with its own token budget. Content is added to sections with an associated priority. When the window fills up, the manager compresses, drops, or summarizes lower-priority content to fit within budget.

| Concept | Description |
|---|---|
| **Section** | A named slot in the context (e.g. `system`, `memory`, `conversation`, `tools`) |
| **Budget** | Token limit assigned to each section |
| **Priority** | Determines what survives compression (`critical` > `important` > `normal` > `optional`) |
| **Strategy** | Algorithm for selecting which content to keep (`recency`, `relevance`, `hybrid`, `budget`) |
| **Compression Level** | How aggressively content is compressed (`light`, `moderate`, `aggressive`) |
| **Turn** | A single message exchange (user + assistant) tracked through lifecycle management |

---

## 🏗 ContextManager

The central class. Manages all sections, budgets, strategies, and content.

### Constructor

```python
from antaris_context import ContextManager

cm = ContextManager(
    total_budget=8000,     # int — total token budget across all sections
    config_file=None,      # str | None — path to a JSON config file
    template=None,         # str | None — pre-built section layout (see Section Templates)
)
```

| Parameter | Type | Default | Description |
|---|---|---|---|
| `total_budget` | `int` | `8000` | Total token ceiling for the context window |
| `config_file` | `str \| None` | `None` | Load config from a JSON file on construction |
| `template` | `str \| None` | `None` | Apply a named section template on construction |

### Available Templates (class method)

```python
templates = ContextManager.get_available_templates()
# Returns: dict mapping template name → section budget breakdown
```

---

## 📐 Section Templates

Templates are pre-configured section budget layouts designed for common agent patterns. Apply one at construction time or later.

```python
# At construction
cm = ContextManager(total_budget=8000, template="agent_with_tools")

# After construction
cm.apply_template("rag_pipeline")
```

| Template | `system` | `memory` | `conversation` | `tools` |
|---|---|---|---|---|
| `chatbot` | 800 | 1500 | 5000 | 700 |
| `agent_with_tools` | 1200 | 2000 | 3500 | 1300 |
| `rag_pipeline` | 600 | 1000 | 4500 | 1900 |
| `code_assistant` | 1000 | 1800 | 4000 | 1200 |
| `balanced` | 1000 | 2000 | 4000 | 1000 |

> **Tip:** Templates are starting points. Override individual section budgets with `set_section_budget()` after applying.

---

## 📝 Content Management

### `add_content()`

Add content to a named section.

```python
cm.add_content(
    section,              # str — target section name
    content,              # str | List[Dict] — text or message list
    priority="normal",    # "critical"|"important"|"normal"|"optional"
    compress=None,        # bool | None — override auto-compression decision
    query=None,           # str | None — query for relevance-based selection within message lists
)
```

**Priority tiers:**

| Priority | Behavior |
|---|---|
| `"critical"` | Never dropped or compressed — always kept in full |
| `"important"` | Compressed last, dropped last |
| `"normal"` | Default — compressed and dropped as needed |
| `"optional"` | First to be compressed and dropped |

**Examples:**

```python
# Add a plain string
cm.add_content("system", "You are an expert Python developer.", priority="critical")

# Add a message list (conversation history)
cm.add_content("conversation", [
    {"role": "user", "content": "Explain decorators."},
    {"role": "assistant", "content": "A decorator wraps a function..."},
], priority="normal")

# Add with relevance-based selection
cm.add_content(
    "memory",
    long_memory_text,
    priority="important",
    query="Python decorators",  # keeps most relevant chunks
)
```

### `add_section()`

Create a custom section with its own budget and priority.

```python
cm.add_section(
    name,             # str — unique section name
    content,          # str | List[Dict] — initial content
    priority=5,       # int, 1–10 — 1=dropped first, 10=kept longest
    budget=None,      # int | None — token budget for this section
)
```

```python
# Add a custom "retrieved_docs" section
cm.add_section("retrieved_docs", rag_chunks, priority=7, budget=1500)
```

### `set_section_budget()`

Override the token budget for an existing section.

```python
cm.set_section_budget("memory", 2000)
cm.set_section_budget("conversation", 4500)
```

### Reading & Clearing Content

```python
# Get all content items in a section
items = cm.get_section_content("conversation")

# Get available token budget
avail = cm.get_available_budget("memory")       # int: remaining tokens for section
avail_all = cm.get_available_budget()           # dict: {section_name: available_tokens}

# Check if any section is over budget
over = cm.is_over_budget()   # bool

# Clear one section
cm.clear_section("memory")

# Clear all content everywhere
cm.clear_all_content()
```

---

## 🎯 Selection Strategies

Strategies determine how content is selected when the context is too full. Set them anytime.

```python
cm.set_strategy(name, **kwargs)
```

### Recency Strategy

Keep the most recent content. Best for chatbots and conversation-heavy agents.

```python
cm.set_strategy("recency")
```

### Relevance Strategy

Keep content most semantically relevant to a given query. Best for RAG pipelines.

```python
cm.set_strategy("relevance")
# Pass query at optimize time:
result = cm.optimize_context(query="Python decorators and closures")
```

### Hybrid Strategy *(default)*

Weighted combination of recency and relevance. Balances freshness with topical fit.

```python
cm.set_strategy("hybrid", recency_weight=0.4, relevance_weight=0.6)
# Default weights: recency=0.4, relevance=0.6
```

| Parameter | Type | Default | Description |
|---|---|---|---|
| `recency_weight` | `float` | `0.4` | Weight given to how recent content is |
| `relevance_weight` | `float` | `0.6` | Weight given to query relevance |

### Budget Strategy

Strict budget enforcement — aggressively trims to stay within limits regardless of content value.

```python
cm.set_strategy("budget")
```

### Strategy Class Reference

You can also import the strategy classes directly for inspection or subclassing:

```python
from antaris_context import (
    ContextStrategy,      # Base class
    RecencyStrategy,
    RelevanceStrategy,
    HybridStrategy,
    BudgetStrategy,
)
```

---

## 🗜 Compression Levels

Control how aggressively content is compressed when the context is full.

```python
cm.set_compression_level("light")       # Minimal compression — preserve as much as possible
cm.set_compression_level("moderate")    # Default — balanced compression
cm.set_compression_level("aggressive")  # Maximum compression — fit more, lose more detail
```

| Level | Description |
|---|---|
| `"light"` | Light trimming only; content shape is mostly preserved |
| `"moderate"` | Default; balances token savings against information loss |
| `"aggressive"` | Maximum compression; strips everything non-essential |

---

## 🔄 Turn Lifecycle Management

Turns represent individual conversation exchanges. Sprint 12 introduced full turn lifecycle management — adding, compacting, and summarizing long conversation histories.

### Adding Turns

```python
cm.add_turn(role="user", content="What is recursion?", section="conversation")
cm.add_turn(role="assistant", content="Recursion is...", section="conversation")

print(cm.turn_count)   # int: total turns added
```

| Parameter | Type | Default | Description |
|---|---|---|---|
| `role` | `str` | required | `"user"` or `"assistant"` |
| `content` | `str` | required | Message text |
| `section` | `str` | `"conversation"` | Target section |

### Retention Policy

Define how many turns to keep verbatim before summarizing or truncating.

```python
cm.set_retention_policy(
    keep_last_n_verbatim=10,   # Keep the last N turns exactly as-is
    summarize_older=True,      # Summarize (or truncate) turns older than N
    max_turns=100,             # Hard cap on total turns
)
```

| Parameter | Type | Default | Description |
|---|---|---|---|
| `keep_last_n_verbatim` | `int` | `10` | Number of most-recent turns to preserve verbatim |
| `summarize_older` | `bool` | `True` | Whether to summarize or just truncate older turns |
| `max_turns` | `int` | `100` | Maximum turn count before compaction is forced |

### Compacting Turns

Compact older turns to free up context space.

```python
n = cm.compact_older_turns(keep_last=20)
# Returns: int — number of turns compacted
```

**Compaction behavior:**

- **Without a summarizer:** Older turns are truncated to 120 characters each.
- **With a summarizer:** The custom summarizer function is called with the older turn text.

### Custom Summarizer

Plug in any LLM to summarize old turns during compaction.

```python
def my_summarizer(text: str) -> str:
    # Call your LLM here
    return llm.summarize(text)

cm.set_summarizer(my_summarizer)

# Lambda form (e.g., calling an existing LLM client)
cm.set_summarizer(lambda text: my_llm_client.summarize(text, max_tokens=200))
```

After setting a summarizer, `compact_older_turns()` will use it instead of truncating.

---

## 🖨 Rendering for APIs

Convert the context manager's contents into a message list ready to send to an LLM API.

### `render()`

Standard render — returns a list of messages, or raises if over budget.

```python
# OpenAI format
messages = cm.render(
    provider="openai",
    system_prompt="You are a helpful assistant.",
)

# Anthropic format
messages = cm.render(
    provider="anthropic",
)
```

Both return a list of `{"role": ..., "content": ...}` dicts compatible with their respective APIs.

| Parameter | Type | Description |
|---|---|---|
| `provider` | `str` | `"openai"` or `"anthropic"` |
| `system_prompt` | `str \| None` | System prompt to prepend (OpenAI only) |

### `render_hard_limited()`

Render with a hard token ceiling. Trims oldest non-system messages first. Raises `ContextBudgetExceeded` if trimming still can't fit within the budget.

```python
from antaris_context import ContextBudgetExceeded

try:
    messages = cm.render_hard_limited(budget_tokens=4000)
except ContextBudgetExceeded as e:
    print(f"Context too large: {e.used} tokens used, budget is {e.budget}")
    # Fall back or reduce content
```

| Parameter | Type | Description |
|---|---|---|
| `budget_tokens` | `int` | Maximum token count for the rendered output |

**Trim order:** Oldest non-system messages are removed first, preserving recency and critical system context.

---

## ⚙️ optimize_context()

The main optimization loop. Analyzes utilization across all sections and takes actions (compress, drop, reorder) to hit the target.

```python
result = cm.optimize_context(
    query=None,                  # str | None — used by relevance-aware strategies
    target_utilization=0.85,     # float — target fraction of total budget to use
)
```

| Parameter | Type | Default | Description |
|---|---|---|---|
| `query` | `str \| None` | `None` | Passed to relevance strategy for scoring |
| `target_utilization` | `float` | `0.85` | Fraction of total budget to aim for (0.0–1.0) |

### CompressionResult

`optimize_context()` returns a `CompressionResult` that supports both attribute and dict access:

```python
result = cm.optimize_context(query="machine learning")

# Attribute access
print(result.success)             # bool — did optimization succeed?
print(result.compression_ratio)   # float — ratio of final to original size
print(result.tokens_saved)        # int
print(result.original_tokens)     # int
print(result.final_tokens)        # int
print(result.sections_dropped)    # List[str] — sections that were dropped
print(result.sections_compressed) # List[str] — sections that were compressed
print(result.actions_taken)       # List[str] — human-readable log of actions
print(result.initial_state)       # dict — section states before optimization
print(result.final_state)         # dict — section states after optimization

# Dict access (identical data)
print(result["compression_ratio"])
print(result["tokens_saved"])
```

### Optimization with Integrations

When integrations are configured, `optimize_context()` uses them automatically:

- **With `antaris-memory`:** Searches memory with the query, boosts matching content items to `"important"` priority before optimization.
- **With `antaris-router`:** Uses router-hinted `target_utilization` if set.

---

## 📊 Adaptive Budgets

Let the context manager learn from usage patterns and automatically reallocate budgets across sections over time.

### Enable Adaptive Budgets

```python
cm.enable_adaptive_budgets(
    enabled=True,
    reallocation_threshold=0.3,   # float — minimum utilization difference to trigger reallocation
)
```

| Parameter | Type | Default | Description |
|---|---|---|---|
| `enabled` | `bool` | `True` | Toggle adaptive budget reallocation on/off |
| `reallocation_threshold` | `float` | `0.3` | How much utilization must differ before triggering reallocation |

### Tracking & Applying

```python
# Snapshot current usage (call periodically — needs 10+ snapshots for suggestions)
cm.track_usage()

# Get reallocation suggestions (returns None if insufficient data)
suggestions = cm.suggest_adaptive_reallocation()
if suggestions:
    print(suggestions)  # dict: {section_name: suggested_new_budget}

# Apply suggestions automatically
cm.apply_adaptive_reallocation(auto_apply=True)
```

**How it works:** The manager tracks section utilization over time. Sections consistently under-using their budget donate tokens to sections that regularly overflow. Requires at least 10 `track_usage()` snapshots before making suggestions.

---

## 🌊 Cascade Overflow

When a section exceeds its budget, cascade overflow pulls slack from other sections to cover it — without displacing any existing content.

```python
redistributed = cm.cascade_overflow("conversation")
# Returns: int — number of tokens redistributed
```

| Parameter | Type | Description |
|---|---|---|
| `section` | `str` | The overflowing section to cover |

**Rules:**
- Only transfers **unused** budget (slack) from other sections
- Never evicts content from donor sections
- Returns `0` if no slack is available

---

## 🔗 Cross-Session Context Sharing

Export a full context snapshot — including content, turns, and configuration — and restore it in any new session.

### Export

```python
# Export everything
snapshot = cm.export_snapshot(include_importance_above=0.0)

# Export only high-importance items (reduces snapshot size)
snapshot = cm.export_snapshot(include_importance_above=0.7)
```

| Parameter | Type | Default | Description |
|---|---|---|---|
| `include_importance_above` | `float` | `0.0` | Only include items with importance score above this threshold |

The snapshot is a serializable dict containing: section configs, budgets, content items, turn history, retention policy, and section priorities.

### Restore

```python
# Restore into a fresh ContextManager instance
cm2 = ContextManager.from_snapshot(snapshot)
```

**What is restored:**
- Section configuration and budgets
- All content items (filtered by importance threshold at export time)
- Turn history and turn count
- Retention policy
- Section priority assignments

**Use cases:** Persist context across process restarts, share context between workers, checkpoint/restore long-running agents.

---

## 💾 In-Process Snapshots

In-memory structural snapshots for checkpoint/restore within a single process. These do **not** include content — only configuration structure.

```python
# Save current structure
cm.save_snapshot("checkpoint-1")

# Restore a saved snapshot
ok = cm.restore_snapshot("checkpoint-1")   # bool — True if snapshot found

# List all saved snapshots
snaps = cm.list_snapshots()
# Returns: [{"name": "checkpoint-1", "timestamp": "2025-01-01T12:00:00"}, ...]
```

> For full content persistence across sessions, use `export_snapshot()` / `from_snapshot()` instead.

---

## 🗂 Config Persistence

Save and load the context manager's configuration (structure, budgets, strategy settings) as JSON.

```python
# Save config to file
cm.save_config("./ctx_config.json")

# Load config from file
cm.load_config("./ctx_config.json")

# Export full state as JSON string (structure only, no content)
state_json = cm.export_state()

# Import state from JSON string
cm.import_state(state_json)
```

**What is saved:** Section definitions, budget allocations, strategy settings, compression level, truncation strategy, auto-compress flag, retention policy.

**What is NOT saved:** Content items, turn history, in-process snapshots.

---

## 📈 Reports & Analysis

### Usage Report

Get a full breakdown of token usage across all sections.

```python
report = cm.get_usage_report()
```

**Report structure:**

```python
{
    "sections": {
        "system": {
            "budget": 1200,
            "used": 847,
            "utilization": 0.706,
            "item_count": 1,
        },
        "conversation": { ... },
        # ... all sections
    },
    "total_budget": 8000,
    "total_used": 5231,
    "overall_utilization": 0.654,
    "over_budget": False,
    "configuration": {
        "strategy": "hybrid",
        "compression_level": "moderate",
        "auto_compress": True,
        "truncation_strategy": "oldest_first",
    },
    "compression_stats": { ... },
}
```

### Context Analysis

Deep analysis via `ContextProfiler`.

```python
analysis = cm.analyze_context(log_analysis=True)
# log_analysis=True prints analysis to stdout
# Returns detailed analysis dict
```

The `ContextProfiler` can also be used standalone:

```python
from antaris_context import ContextProfiler

profiler = ContextProfiler()
```

---

## ✂️ Truncation Strategies

Truncation strategies determine the order in which content is dropped when the context is over budget. Set via config.

```python
cm.load_config({
    "truncation_strategy": "oldest_first"   # or "lowest_priority" or "smart_summary_markers"
})
```

| Strategy | Behavior |
|---|---|
| `"oldest_first"` | *(default)* Removes the oldest non-critical items first — preserves recency |
| `"lowest_priority"` | Removes `optional` first, then `normal`, then `important`; never removes `critical` |
| `"smart_summary_markers"` | Preserves items tagged as summaries or headers; drops regular content first |

---

## 🏋️ Importance-Weighted Compression

Low-level compression utilities for fine-grained control over which content items to keep.

### ImportanceWeightedCompressor

Keeps the top-N most important items and compresses the middle tier.

```python
from antaris_context import ImportanceWeightedCompressor

compressor = ImportanceWeightedCompressor(
    keep_top_n=3,           # int — always keep the N highest-importance items
    compress_middle=True,   # bool — compress mid-tier items instead of dropping
)

result = compressor.compress_items(content_items)
```

**Result dict:**

```python
{
    "kept": [...],           # List of items kept verbatim
    "compressed": [...],     # List of compressed items
    "tokens_saved": 412,     # int
}
```

### SemanticChunker

Split long text into semantically coherent chunks.

```python
from antaris_context import SemanticChunker, SemanticChunk

chunker = SemanticChunker()
chunks = chunker.chunk("A very long document text...")
# Returns: List[SemanticChunk]
```

Each `SemanticChunk` represents a semantically self-contained piece of the original text, useful for relevance scoring and selective inclusion.

---

## 🧩 Integration: antaris-memory

Connect to `antaris-memory` to make `optimize_context()` memory-aware. When a query is provided, the optimizer will search memory for relevant hits and boost matching context items to `"important"` priority before compressing.

```python
import antaris_memory as mem
from antaris_context import ContextManager

memory = mem.MemorySystem(...)   # your memory instance
cm = ContextManager(total_budget=8000, template="agent_with_tools")
cm.set_memory_client(memory)

# Now optimize_context() uses memory search to boost relevant items
result = cm.optimize_context(query="Python async programming")
# → searches memory for "Python async programming"
# → extracts keywords from hits
# → boosts matching context items to "important"
# → runs standard optimization
```

### MemoryClient Protocol

Any object implementing the following interface satisfies the protocol:

```python
from antaris_context import MemoryClient

class MyMemoryBackend:
    def search(self, query: str, limit: int = 5) -> list:
        ...  # return list of memory items
```

Any `antaris_memory.MemorySystem` instance satisfies this protocol automatically.

---

## 🔀 Integration: antaris-router

Connect to `antaris-router` to let routing decisions influence context optimization. Router hints can shift budget between sections and override target utilization.

```python
from antaris_context import ContextManager

cm = ContextManager(total_budget=8000, template="agent_with_tools")

cm.set_router_hints({
    "boost_section": "tools",       # str — shifts 10% of budget from other sections to this one
    "target_utilization": 0.7,      # float — overrides optimize_context default
    "task_type": "code",            # str — informational; used by routing logic
})

# optimize_context now uses router-hinted target_utilization
result = cm.optimize_context(query="write a sorting algorithm")
```

**Hint keys:**

| Key | Type | Effect |
|---|---|---|
| `"boost_section"` | `str` | Shifts 10% of total budget from other sections to this section |
| `"target_utilization"` | `float` | Overrides the `target_utilization` parameter in `optimize_context()` |
| `"task_type"` | `str` | Informational — logged and available for downstream use |

---

## 📦 All Exports

```python
from antaris_context import (
    # Core
    ContextManager,
    ContextWindow,
    MessageCompressor,

    # Strategies
    ContextStrategy,
    RecencyStrategy,
    RelevanceStrategy,
    HybridStrategy,
    BudgetStrategy,

    # Analysis
    ContextProfiler,
    CompressionResult,

    # Compression utilities
    ImportanceWeightedCompressor,
    SemanticChunker,
    SemanticChunk,

    # Integration protocols
    MemoryClient,

    # Exceptions
    ContextBudgetExceeded,
)
```

---

## 🚨 Exception Reference

### `ContextBudgetExceeded`

Raised by `render_hard_limited()` when the context cannot be trimmed to fit within the specified budget.

```python
from antaris_context import ContextBudgetExceeded

try:
    messages = cm.render_hard_limited(budget_tokens=4000)
except ContextBudgetExceeded as e:
    print(f"Used: {e.used} tokens")
    print(f"Budget: {e.budget} tokens")
    print(f"Overflow: {e.used - e.budget} tokens over limit")
```

| Attribute | Type | Description |
|---|---|---|
| `e.used` | `int` | Actual token count of the trimmed context |
| `e.budget` | `int` | The hard limit that was exceeded |

---

## 🤖 Full Example: Agent with Tools

A complete, realistic example combining all major features.

```python
import antaris_memory as mem
from antaris_context import ContextManager, ContextBudgetExceeded

# --- Setup ---
memory_system = mem.MemorySystem(...)

cm = ContextManager(total_budget=16000, template="agent_with_tools")
cm.set_memory_client(memory_system)
cm.set_strategy("hybrid", recency_weight=0.3, relevance_weight=0.7)
cm.set_compression_level("moderate")
cm.set_router_hints({"boost_section": "tools", "target_utilization": 0.8})

# Set retention policy for long conversations
cm.set_retention_policy(
    keep_last_n_verbatim=20,
    summarize_older=True,
    max_turns=200,
)

# Register a real LLM summarizer
cm.set_summarizer(lambda text: my_llm.summarize(text, max_tokens=300))

# Enable adaptive budgets
cm.enable_adaptive_budgets(enabled=True, reallocation_threshold=0.25)

# --- System & Tools ---
cm.add_content("system", SYSTEM_PROMPT, priority="critical")
cm.add_content("tools", TOOL_DEFINITIONS, priority="important")

# --- Per-Turn Loop ---
for user_message in incoming_messages:

    # Add memory context for this query
    cm.clear_section("memory")
    memory_hits = memory_system.search(user_message, limit=5)
    for hit in memory_hits:
        cm.add_content("memory", hit["content"], priority="important")

    # Add user turn
    cm.add_turn(role="user", content=user_message)

    # Compact if needed
    if cm.turn_count > 50:
        cm.compact_older_turns(keep_last=30)

    # Optimize before sending
    result = cm.optimize_context(query=user_message)
    if not result.success:
        print(f"Optimization warning: {result.actions_taken}")

    # Track usage for adaptive budgets
    cm.track_usage()

    # Render and call LLM
    try:
        messages = cm.render_hard_limited(budget_tokens=14000)
    except ContextBudgetExceeded as e:
        print(f"Hard limit hit: {e.used}/{e.budget}, falling back to optimize")
        cm.optimize_context(target_utilization=0.75)
        messages = cm.render(provider="openai", system_prompt=SYSTEM_PROMPT)

    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )
    assistant_reply = response.choices[0].message.content

    # Add assistant turn
    cm.add_turn(role="assistant", content=assistant_reply)

# --- End of Session ---

# Apply adaptive reallocation based on observed patterns
suggestions = cm.suggest_adaptive_reallocation()
if suggestions:
    print("Adaptive reallocation suggestions:", suggestions)
    cm.apply_adaptive_reallocation(auto_apply=True)

# Export snapshot for next session
snapshot = cm.export_snapshot(include_importance_above=0.5)
save_to_db(snapshot)   # persist however you like

# --- Next Session: Restore ---
snapshot = load_from_db()
cm_new = ContextManager.from_snapshot(snapshot)
# Ready to continue — all turns, content, and config restored
```

---

## 🔗 Related Packages

| Package | Description |
|---|---|
| [`antaris-memory`](https://pypi.org/project/antaris-memory/) | Long-term semantic memory for AI agents |
| [`antaris-router`](https://pypi.org/project/antaris-router/) | Intelligent request routing with model selection |
| [`antaris-pipeline`](https://pypi.org/project/antaris-pipeline/) | Multi-step agent pipeline orchestration |
| [`antaris-guard`](https://pypi.org/project/antaris-guard/) | Input/output safety filtering |
| [`antaris-suite`](https://pypi.org/project/antaris-suite/) | All-in-one bundle of the antaris ecosystem |

---

## 📄 License

MIT — see [LICENSE](LICENSE) for details.

---

*Built by [Antaris Analytics](https://antarisanalytics.ai)*
