Metadata-Version: 2.4
Name: antaris-router
Version: 5.0.1
Summary: File-based model router for LLM cost optimization. Zero dependencies.
Author-email: Antaris Analytics <dev@antarisanalytics.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/Antaris-Analytics/antaris-router
Project-URL: Documentation, https://router.antarisanalytics.ai
Project-URL: Repository, https://github.com/Antaris-Analytics/antaris-router
Project-URL: Issues, https://github.com/Antaris-Analytics/antaris-router/issues
Keywords: ai,llm,router,cost,optimization,models,deterministic
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# antaris-router ⚡

**Adaptive LLM model routing for cost optimization — zero dependencies, stdlib only.**

Route every prompt to the right model at the right cost. `antaris-router` classifies task complexity, selects the optimal model from your registry, enforces SLAs, tracks provider health, and continuously improves routing quality through outcome feedback.

[![PyPI version](https://img.shields.io/pypi/v/antaris-router)](https://pypi.org/project/antaris-router/)
[![Python](https://img.shields.io/pypi/pyversions/antaris-router)](https://pypi.org/project/antaris-router/)
[![Zero dependencies](https://img.shields.io/badge/dependencies-zero-brightgreen)](https://pypi.org/project/antaris-router/)

---

## 📦 Installation

```bash
pip install antaris-router
```

**Version:** `4.9.20`
**Dependencies:** None — pure Python stdlib only.

---

## 🗺️ Table of Contents

1. [Why antaris-router?](#-why-antaris-router)
2. [Tier System](#-tier-system)
3. [Quick Start](#-quick-start)
4. [v2.0 API — AdaptiveRouter (Semantic)](#-v20-api--adaptiverouter-semantic-self-improving)
5. [v1.0 API — Router (Keyword-based)](#-v10-api--router-keyword-based-production)
6. [RoutingDecision Fields](#-routingdecision-fields)
7. [Explainability — explain()](#-explainability--explain)
8. [Confidence-Gated Escalation](#-confidence-gated-escalation)
9. [SLA Configuration & Enforcement](#-sla-configuration--enforcement)
10. [Provider Health Tracking](#-provider-health-tracking)
11. [A/B Testing](#-ab-testing)
12. [Cost Forecasting](#-cost-forecasting)
13. [Cost Tracking & Analytics](#-cost-tracking--analytics)
14. [Model Registry](#-model-registry)
15. [SemanticClassifier (v2.0)](#-semanticclassifier-v20)
16. [QualityTracker (v2.0)](#-qualitytracker-v20)
17. [ClassificationResult & Signals](#-classificationresult--signals)
18. [Full API Reference](#-full-api-reference)
19. [Complete Exports](#-complete-exports)
20. [Migration: v1.0 → v2.0](#-migration-v10--v20)

---

## 🎯 Why antaris-router?

LLM costs are asymmetric. A one-line question routed to `claude-opus` wastes 50–100× what it needs to. `antaris-router` fixes that:

| Without routing | With antaris-router |
|---|---|
| Every request → one expensive model | Each request → cheapest capable model |
| No visibility into cost breakdown | Real-time cost tracking + forecasting |
| Silent model failures | Provider health tracking + auto-failover |
| Blind prompt-to-model mapping | TF-IDF semantic classification (v2.0) |
| No quality signal loop | Outcome feedback → self-improving routing |

---

## 🏆 Tier System

`antaris-router` classifies every prompt into one of five complexity tiers. Each tier maps to a cost bracket, ensuring you always pay proportionally to task complexity.

| Tier | Char Range | Typical Tasks | Strategy |
|------|-----------|---------------|----------|
| `trivial` | ≤ 50 chars | Simple Q&A, single-word lookups | Cheapest model |
| `simple` | 50–200 chars | Basic tasks, short explanations | Low-cost model |
| `moderate` | 200–1,000 chars | Standard tasks, multi-step answers | Mid-tier model |
| `complex` | 1,000–3,000 chars | Analysis, architecture, code review | Powerful model |
| `expert` | 3,000+ chars | Highest complexity, long-form reasoning | Most capable model |

Tier boundaries are based on character count combined with keyword signals, code detection, and structural complexity analysis. The v2.0 `AdaptiveRouter` additionally uses TF-IDF semantic classification and improves tier accuracy over time through outcome feedback.

---

## ⚡ Quick Start

### v2.0 — AdaptiveRouter (recommended for new projects)

```python
from antaris_router import AdaptiveRouter, ModelConfig

router = AdaptiveRouter("./routing_data", ab_test_rate=0.05)

router.register_model(ModelConfig(
    name="gpt-4o-mini",
    tier_range=("trivial", "moderate"),
    cost_per_1k_input=0.00015,
    cost_per_1k_output=0.0006,
))
router.register_model(ModelConfig(
    name="claude-sonnet-4-6",
    tier_range=("simple", "complex"),
    cost_per_1k_input=0.003,
    cost_per_1k_output=0.015,
))
router.register_model(ModelConfig(
    name="claude-opus-4-6",
    tier_range=("complex", "expert"),
    cost_per_1k_input=0.015,
    cost_per_1k_output=0.075,
))

result = router.route("Implement a distributed task queue with priority scheduling")

print(result.model)          # "claude-sonnet-4-6"
print(result.tier)           # "complex"
print(result.confidence)     # 0.87
print(result.estimated_cost) # 0.00234

# Feed outcome back to improve future routing
router.report_outcome(result.prompt_hash, quality_score=0.9, success=True)

# Session analytics
analytics = router.get_analytics()
print(analytics)
# {
#   "total_routed": 42,
#   "tier_distribution": {"trivial": 5, "simple": 12, "moderate": 15, "complex": 8, "expert": 2},
#   "avg_quality": 0.88,
#   "model_usage": {"gpt-4o-mini": 17, "claude-sonnet-4-6": 21, "claude-opus-4-6": 4},
#   "cost_savings": 0.142
# }
```

### v1.0 — Router (production-proven, keyword-based)

```python
from antaris_router import Router

router = Router(enable_cost_tracking=True)

decision = router.route("Explain async/await in Python with examples")
print(decision.model)    # "claude-sonnet-4-6"
print(decision.tier)     # "moderate"
print(decision.confidence) # 0.82

print(router.explain(decision))
```

---

## 🤖 v2.0 API — AdaptiveRouter (Semantic, Self-Improving)

`AdaptiveRouter` is the next-generation router. It uses TF-IDF vectorization for semantic classification, learns from outcome feedback, and persists routing state across sessions.

### Constructor

```python
router = AdaptiveRouter(
    workspace="./routing_data",  # directory for persisted state
    ab_test_rate=0.05,           # fraction of routes used for A/B exploration (0.0–1.0)
)
```

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `workspace` | `str` | required | Path to directory for persisted routing data, quality history, and TF-IDF model |
| `ab_test_rate` | `float` | `0.05` | Fraction of routing decisions used for A/B exploration. Set `0.0` to disable. |

The workspace directory is created automatically if it doesn't exist.

### `register_model(config: ModelConfig)`

Register a model with its tier range and cost parameters.

```python
router.register_model(ModelConfig(
    name="gpt-4o-mini",
    tier_range=("trivial", "moderate"),   # (min_tier, max_tier)
    cost_per_1k_input=0.00015,
    cost_per_1k_output=0.0006,
))
```

**`ModelConfig` fields:**

| Field | Type | Description |
|-------|------|-------------|
| `name` | `str` | Model identifier (e.g. `"gpt-4o-mini"`) |
| `tier_range` | `Tuple[str, str]` | `(min_tier, max_tier)` — tiers this model handles |
| `cost_per_1k_input` | `float` | Cost in USD per 1K input tokens |
| `cost_per_1k_output` | `float` | Cost in USD per 1K output tokens |

**Tier range semantics:** A model registered with `tier_range=("simple", "complex")` is eligible for `simple`, `moderate`, and `complex` prompts. The router selects the lowest-cost eligible model for each tier.

### `route(prompt: str) → RoutingResult`

Classify the prompt and select the optimal model.

```python
result = router.route("Summarize the following contract clause: ...")
```

**Returns `RoutingResult`:**

| Field | Type | Description |
|-------|------|-------------|
| `model` | `str` | Selected model name |
| `tier` | `str` | Classified complexity tier |
| `confidence` | `float` | Classification confidence (0.0–1.0) |
| `prompt_hash` | `str` | SHA-256 hash of prompt (used for outcome feedback) |
| `estimated_cost` | `float` | Estimated cost in USD for this request |

### `report_outcome(prompt_hash: str, quality_score: float, success: bool)`

Feed outcome back to the router to improve future routing decisions. This is the core self-improvement loop.

```python
router.report_outcome(
    result.prompt_hash,
    quality_score=0.9,  # 0.0–1.0, how good the model's response was
    success=True,       # whether the request succeeded at all
)
```

The router uses outcome history to:
- Detect tier misclassifications (e.g. a `moderate` prompt that consistently gets poor quality → escalate to `complex`)
- Track per-model quality trends across tier assignments
- Improve TF-IDF weights over time

### `get_analytics() → Dict`

Aggregate routing stats for the current session.

```python
analytics = router.get_analytics()
# {
#   "total_routed": int,
#   "tier_distribution": {"trivial": int, "simple": int, ...},
#   "avg_quality": float,
#   "model_usage": {"model-name": int, ...},
#   "cost_savings": float   # USD saved vs always using most capable model
# }
```

---

## 🔧 v1.0 API — Router (Keyword-based, Production)

`Router` is the production-proven keyword-based router. Fully featured with SLA enforcement, confidence-gated escalation, provider health tracking, A/B testing, and cost forecasting. Use this for stability; use `AdaptiveRouter` for semantic accuracy.

### Constructor

```python
from antaris_router import Router, SLAConfig

router = Router(
    config_path=None,                  # optional path to JSON config file
    enable_cost_tracking=True,         # track per-model cost usage
    low_confidence_threshold=0.0,      # 0.0 = never escalate (default)
    escalation_model=None,             # model to escalate to when confidence is low
    escalation_strategy="always",      # "always" | "log_only" | "ask"
    sla=None,                          # SLAConfig instance
    fallback_chain=None,               # ordered list of fallback model names
    classifier=None,                   # inject custom classifier (e.g. SemanticClassifier)
)
```

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `config_path` | `str \| None` | `None` | Path to JSON config file. If `None`, uses built-in defaults. |
| `enable_cost_tracking` | `bool` | `True` | Track cost per model, per session. Required for `cost_report()`, `savings_estimate()`. |
| `low_confidence_threshold` | `float` | `0.0` | Confidence below this triggers escalation. `0.0` = disabled. |
| `escalation_model` | `str \| None` | `None` | Model name to escalate to on low confidence. |
| `escalation_strategy` | `str` | `"always"` | Escalation behavior: `"always"` swaps model, `"log_only"` logs but keeps model, `"ask"` signals user to confirm. |
| `sla` | `SLAConfig \| None` | `None` | SLA constraints to enforce during routing. |
| `fallback_chain` | `List[str] \| None` | `None` | Ordered fallback models for `auto_scale=True`. |
| `classifier` | `object \| None` | `None` | Custom classifier to inject (e.g. `SemanticClassifier`). Replaces built-in keyword classifier. |

### `route(...) → RoutingDecision`

Route a prompt to the optimal model.

```python
decision = router.route(
    prompt="text to route",            # required
    context=None,                      # optional: additional context dict
    prefer=None,                       # preferred provider: "claude" | "openai" | etc.
    min_tier=None,                     # minimum tier floor: "simple"|"moderate"|"complex"|"expert"
    capability=None,                   # required capability: "vision"|"code"|etc.
    estimate_tokens=(100, 50),         # (input_tokens, output_tokens) for cost estimation
    ab_test=None,                      # A/B test config from create_ab_test()
    prefer_healthy=False,              # skip degraded/rate-limited providers
    auto_scale=False,                  # fall back through fallback_chain if primary is degraded or over-budget
)
```

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | `str` | required | The prompt text to classify and route |
| `context` | `dict \| None` | `None` | Additional context for routing decisions |
| `prefer` | `str \| None` | `None` | Preferred provider name. Router respects this if an eligible model exists. |
| `min_tier` | `str \| None` | `None` | Force minimum complexity tier. E.g. `"complex"` ensures at least a complex-tier model. |
| `capability` | `str \| None` | `None` | Required model capability. Only models with this capability are considered. |
| `estimate_tokens` | `Tuple[int, int]` | `(100, 50)` | `(input_tokens, output_tokens)` used for cost estimation in `decision.estimated_cost`. |
| `ab_test` | `ABTest \| None` | `None` | A/B test object from `create_ab_test()`. Enables variant-based routing. |
| `prefer_healthy` | `bool` | `False` | If `True`, degraded or down providers are skipped. Falls through to next eligible model. |
| `auto_scale` | `bool` | `False` | If `True` and primary model is degraded or over-budget, routes through `fallback_chain` in order. |

---

## 📋 RoutingDecision Fields

Every `router.route()` call returns a `RoutingDecision` object with full decision transparency.

```python
decision = router.route("Design a microservices platform for high-throughput event processing")

decision.model              # str: "claude-sonnet-4-6"
decision.provider           # str: "anthropic"
decision.tier               # str: "complex"
decision.confidence         # float: 0.85
decision.reasoning          # List[str]: ["Input length 1,250 chars → complex range", ...]
decision.estimated_cost     # float: 0.00525 (USD)
decision.fallback_models    # List[str]: ["claude-opus-4-6", "gpt-4o"]
decision.classification     # ClassificationResult object
decision.confidence_basis   # str: "keyword_density" | "composite" | "rule_based"
decision.evidence           # List[str]: human-readable decision signals
decision.escalated          # bool: True if escalation changed the model
decision.original_confidence  # float: pre-escalation confidence (if escalated)
decision.escalation_reason  # str: why escalation triggered (if escalated)
decision.ab_variant         # str: "a" | "b" if A/B test active
decision.explanation        # str: full human-readable explanation
decision.supports_streaming # bool: whether selected model supports streaming
decision.sla_compliant      # bool: whether decision satisfies all SLA constraints
decision.sla_breaches       # List[str]: e.g. ["latency_exceeded", "budget_exceeded"]
decision.sla_adjustments    # List[str]: e.g. ["routed_to_cheaper_model_due_to_budget_sla"]

decision.selected_model     # property alias for decision.model
decision.to_dict()          # Dict: all fields serialized to a plain dict
```

### Complete Field Reference

| Field | Type | Description |
|-------|------|-------------|
| `model` | `str` | Name of the selected model |
| `provider` | `str` | Provider name: `"anthropic"`, `"openai"`, etc. |
| `tier` | `str` | Complexity tier: `trivial/simple/moderate/complex/expert` |
| `confidence` | `float` | Classification confidence 0.0–1.0 |
| `reasoning` | `List[str]` | Ordered list of reasons why this model was chosen |
| `estimated_cost` | `float` | Estimated USD cost for this specific request |
| `fallback_models` | `List[str]` | Ordered list of alternative models considered |
| `classification` | `ClassificationResult` | Raw classification output including signals |
| `confidence_basis` | `str` | How confidence was computed: `"keyword_density"`, `"composite"`, `"rule_based"` |
| `evidence` | `List[str]` | Human-readable signals that drove the decision |
| `escalated` | `bool` | `True` if escalation logic overrode the original model selection |
| `original_confidence` | `float` | Confidence before escalation (populated only when `escalated=True`) |
| `escalation_reason` | `str` | Human-readable reason escalation triggered |
| `ab_variant` | `str` | `"a"` or `"b"` when an A/B test is active, `""` otherwise |
| `explanation` | `str` | Full plain-English explanation of the routing decision |
| `supports_streaming` | `bool` | Whether the selected model supports streaming responses |
| `sla_compliant` | `bool` | Whether the decision satisfies all active SLA constraints |
| `sla_breaches` | `List[str]` | Which SLA constraints were breached (if any) |
| `sla_adjustments` | `List[str]` | Routing adjustments made to satisfy SLA constraints |
| `selected_model` | property | Alias for `model` |

### `to_dict()` Output

```python
d = decision.to_dict()
# {
#   "model": "claude-sonnet-4-6",
#   "provider": "anthropic",
#   "tier": "complex",
#   "confidence": 0.85,
#   "reasoning": [...],
#   "estimated_cost": 0.00525,
#   "fallback_models": [...],
#   "confidence_basis": "keyword_density",
#   "evidence": [...],
#   "escalated": False,
#   "original_confidence": 0.0,
#   "escalation_reason": "",
#   "ab_variant": "",
#   "explanation": "Model selected: claude-sonnet-4-6 ...",
#   "supports_streaming": True,
#   "sla_compliant": True,
#   "sla_breaches": [],
#   "sla_adjustments": []
# }
```

---

## 🔍 Explainability — `explain()`

Every routing decision can be explained in plain English. Use `explain()` for debugging, auditing, or displaying routing logic to users.

```python
explanation = router.explain(decision)
print(explanation)
```

**Example output:**

```
Model selected: claude-sonnet-4-6 (confidence: 85%)
Basis: keyword density
Reasoning: Input classified as 'complex' task (85% confidence). Length 1,250 chars falls in
complex range (1,000–3,000). Strong signal keywords detected: "microservices", "architecture",
"distributed".
Estimated cost: $0.003000 per 1K tokens (this request: $0.005250).
Evidence: length: 1250 chars → complex range (≤3000), keyword match: 3 'complex'-tier keywords
(microservices, architecture, distributed), structural_complexity: 2
Alternatives considered: claude-opus-4-6 (more capable, 5.0x cost), gpt-4o-mini (cheaper, reduced quality)
```

**When escalation occurred:**

```
Model selected: claude-opus-4-6 (confidence: 45%)
[Escalated from original confidence 45%: Low confidence below threshold 0.60. Original model: claude-sonnet-4-6]
Basis: composite
Reasoning: Input classified as 'moderate' task (45% confidence)...
```

**`explain()` sections:**

| Section | Always shown | Description |
|---------|-------------|-------------|
| `Model selected: X (confidence: Y%)` | ✅ | Selected model and final confidence |
| `[Escalated from ...]` | Only if escalated | Pre-escalation state and trigger reason |
| `Basis: X` | ✅ | Confidence computation method |
| `Reasoning: ...` | ✅ | Human-readable classification narrative |
| `Estimated cost: ...` | ✅ | Per-1K and per-request cost |
| `Evidence: ...` | ✅ | Raw signals that drove classification |
| `Alternatives considered: ...` | ✅ | Other models with relative cost factor |

---

## 🚦 Confidence-Gated Escalation

When the classifier is uncertain about a prompt's complexity, `antaris-router` can automatically escalate to a more capable model rather than risk a low-quality response.

### Configuration

```python
router = Router(
    low_confidence_threshold=0.6,          # escalate if confidence < 0.6
    escalation_model="claude-opus-4-6",    # which model to escalate to
    escalation_strategy="always",          # escalation behavior
)
```

### Escalation Strategies

| Strategy | Behavior | Use Case |
|----------|----------|----------|
| `"always"` | Replaces selected model with `escalation_model` | Production: trust the router's escalation |
| `"log_only"` | Logs the low-confidence event, keeps original model | Monitoring: observe without changing behavior |
| `"ask"` | Sets `decision.escalated=True` + `escalation_reason`, keeps original model | Human-in-the-loop: surface uncertainty to user |

### Usage

```python
router = Router(
    low_confidence_threshold=0.6,
    escalation_model="claude-opus-4-6",
    escalation_strategy="always",
)

decision = router.route("What does this cryptic error mean in this context?")

if decision.escalated:
    print(f"Escalated! Original confidence: {decision.original_confidence:.2f}")
    print(f"Reason: {decision.escalation_reason}")
    print(f"Using: {decision.model}")  # claude-opus-4-6
```

### Strategy: `"ask"` — Human-in-the-Loop

When `escalation_strategy="ask"`, the router signals uncertainty without changing the model. Use this to prompt users to confirm the routing decision:

```python
router = Router(
    low_confidence_threshold=0.65,
    escalation_model="claude-opus-4-6",
    escalation_strategy="ask",
)

decision = router.route("some ambiguous prompt")

if decision.escalated:
    # Present choice to user
    print(f"Router is uncertain (confidence: {decision.original_confidence:.0%}).")
    print(f"Suggested escalation: {decision.escalation_reason}")
    print(f"Upgrade to claude-opus-4-6? Current model: {decision.model}")
```

### Escalation Decision Fields

When `decision.escalated is True`:

```python
decision.escalated           # True
decision.original_confidence # e.g. 0.48 — confidence before escalation
decision.escalation_reason   # e.g. "Low confidence below threshold 0.60. Original model: claude-sonnet-4-6"
decision.model               # escalation_model (if strategy="always"), else original model
```

---

## 📊 SLA Configuration & Enforcement

`antaris-router` enforces Service Level Agreements on latency, budget, and response quality. When constraints are breached, the router adjusts model selection automatically.

### Setup

```python
from antaris_router import Router, SLAConfig

sla = SLAConfig(
    max_latency_ms=200,           # max acceptable latency per request
    budget_per_hour_usd=5.00,     # hourly spend cap in USD
    min_quality_score=0.7,        # minimum acceptable quality (0.0–1.0)
    auto_escalate_on_breach=True, # automatically adjust routing on SLA breach
)

router = Router(
    sla=sla,
    fallback_chain=["claude-sonnet-4-6", "claude-haiku-3-5", "gpt-4o-mini"],
)
```

### `SLAConfig` Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `max_latency_ms` | `float` | Maximum acceptable request latency in milliseconds |
| `budget_per_hour_usd` | `float` | Maximum spend per hour in USD |
| `min_quality_score` | `float` | Minimum acceptable quality score (0.0–1.0) |
| `auto_escalate_on_breach` | `bool` | If `True`, router adjusts model selection to restore SLA compliance |

### Routing With SLA

```python
decision = router.route("prompt", auto_scale=True)

# SLA compliance info on every decision
print(decision.sla_compliant)    # True / False
print(decision.sla_breaches)     # ["budget_exceeded", "latency_exceeded"]
print(decision.sla_adjustments)  # ["routed_to_cheaper_model_due_to_budget_sla"]
```

### `get_sla_report(since_hours=1.0) → Dict`

Aggregate SLA compliance report over a time window.

```python
report = router.get_sla_report(since_hours=1.0)
# {
#   "compliance_rate": 0.94,
#   "breaches": {
#     "latency": 3,
#     "cost": 1,
#     "quality": 2
#   },
#   "adjustments_made": 4,
#   "cost_savings_usd": 0.87,
#   "avg_latency_ms": 142.3,
#   "budget_utilization": 0.68,
#   "total_requests": 150
# }
```

| Field | Description |
|-------|-------------|
| `compliance_rate` | Fraction of requests fully SLA-compliant (0.0–1.0) |
| `breaches.latency` | Count of latency SLA breaches |
| `breaches.cost` | Count of budget SLA breaches |
| `breaches.quality` | Count of quality SLA breaches |
| `adjustments_made` | Count of routing adjustments made to restore SLA compliance |
| `cost_savings_usd` | USD saved through SLA-driven model downgrade |
| `avg_latency_ms` | Average request latency over the window |
| `budget_utilization` | Fraction of hourly budget consumed (0.0–1.0) |
| `total_requests` | Total requests in the time window |

### `check_budget_alert() → Dict`

Real-time budget status and spend projection.

```python
alert = router.check_budget_alert()
# {
#   "status": "warning",            # "ok" | "warning" | "critical"
#   "hourly_spend_usd": 3.42,
#   "budget_usd": 5.00,
#   "utilization": 0.684,
#   "projected_hourly_usd": 4.89,
#   "recommendation": "Consider routing moderate tasks to gpt-4o-mini to reduce spend"
# }
```

| Status | Trigger |
|--------|---------|
| `"ok"` | Utilization below warning threshold |
| `"warning"` | Approaching budget limit |
| `"critical"` | At or over budget limit |

### `record_sla_quality(model, score)`

Record an observed quality score for a completed request. Used to track quality SLA compliance.

```python
router.record_sla_quality("claude-sonnet-4-6", score=0.85)
```

### `get_cost_optimizations(estimate_tokens) → List[Dict]`

Get actionable cost optimization suggestions based on current routing patterns.

```python
suggestions = router.get_cost_optimizations(estimate_tokens=(100, 50))
# [
#   {
#     "suggestion": "Route 'moderate' prompts to gpt-4o-mini instead of claude-sonnet-4-6",
#     "estimated_savings_usd_per_day": 2.34,
#     "tradeoff": "Slightly lower quality for moderate tasks (est. -0.05 quality score)"
#   },
#   {
#     "suggestion": "Enable confidence-gated escalation to reduce expert-tier misrouting",
#     "estimated_savings_usd_per_day": 0.89,
#     "tradeoff": "Adds ~10ms classification overhead per request"
#   }
# ]
```

---

## 🏥 Provider Health Tracking

Track real-time health of each provider/model. Route around degraded providers automatically.

### Recording Events

```python
# After a successful call
router.record_provider_event(
    "claude-sonnet-4-6",
    event="success",
    latency_ms=245.0,
)

# After an error
router.record_provider_event(
    "claude-sonnet-4-6",
    event="error",
    details="rate_limited",
)

# After a timeout
router.record_provider_event("gpt-4o", event="timeout")
```

**Event types:**

| Event | Description |
|-------|-------------|
| `"success"` | Request completed successfully. `latency_ms` recorded. |
| `"error"` | Request failed. `details` string (e.g. `"rate_limited"`, `"context_exceeded"`) |
| `"timeout"` | Request timed out. |

### `get_provider_health(model) → Dict`

```python
health = router.get_provider_health("claude-sonnet-4-6")
# {
#   "model": "claude-sonnet-4-6",
#   "status": "healthy",       # "healthy" | "degraded" | "down"
#   "success_rate_1h": 0.97,
#   "avg_latency_ms": 231.4,
#   "recent_errors": ["rate_limited"],
#   "last_seen": 1741500000.0  # Unix timestamp
# }
```

| Status | Meaning |
|--------|---------|
| `"healthy"` | High success rate, normal latency |
| `"degraded"` | Elevated error rate or latency — still usable but non-preferred |
| `"down"` | No recent successes — excluded from routing |

### Health-Aware Routing

```python
# Skip degraded/down providers entirely
decision = router.route("prompt", prefer_healthy=True)
```

When `prefer_healthy=True`:
- Models with status `"degraded"` or `"down"` are skipped
- Router falls through to next eligible model in cost order
- If all eligible models are degraded, falls back to least-degraded option

**Combining with `auto_scale`:**

```python
decision = router.route(
    "prompt",
    prefer_healthy=True,
    auto_scale=True,           # use fallback_chain when primary is unavailable
)
```

---

## 🧪 A/B Testing

Run controlled routing experiments to compare strategies — cost-optimized vs quality-first — with configurable traffic splits.

### Creating an A/B Test

```python
ab_test = router.create_ab_test(
    name="quality-vs-cost",
    strategy_a="cost_optimized",   # baseline strategy
    strategy_b="quality_first",    # bumps tier one level for B variant
    split=0.5,                     # 50/50 split; 0.3 = 30% to B
)
```

| Parameter | Type | Description |
|-----------|------|-------------|
| `name` | `str` | Human-readable test name |
| `strategy_a` | `str` | Baseline strategy: `"cost_optimized"` |
| `strategy_b` | `str` | Experimental strategy: `"quality_first"` bumps tier by one level |
| `split` | `float` | Fraction of traffic routed to strategy B (0.0–1.0) |

### Running the Test

```python
decision = router.route("Summarize the quarterly earnings report", ab_test=ab_test)

print(decision.ab_variant)     # "a" or "b"
print(decision.model)          # varies by variant

if decision.ab_variant == "b":
    # B variant gets one tier higher → more capable model
    print("Quality-first routing applied")
```

### Strategies

| Strategy | Behavior |
|----------|----------|
| `"cost_optimized"` | Standard routing — cheapest eligible model for detected tier |
| `"quality_first"` | Bumps detected tier up by one level (e.g. `moderate` → `complex`) for higher quality |

### Collecting Results

Track `ab_variant` alongside actual quality scores to measure the tradeoff:

```python
# In your application
decision = router.route(prompt, ab_test=ab_test)
response = call_llm(decision.model, prompt)
quality = evaluate(response)

router.record_sla_quality(decision.model, quality)

# Store for analysis
results.append({
    "variant": decision.ab_variant,
    "model": decision.model,
    "cost": decision.estimated_cost,
    "quality": quality,
})
```

---

## 💰 Cost Forecasting

Project future LLM costs based on current routing distribution and expected traffic.

### `forecast_cost(...) → Dict`

```python
forecast = router.forecast_cost(
    requests_per_hour=1000,
    avg_input_tokens=500,
    avg_output_tokens=200,
)
```

| Parameter | Type | Description |
|-----------|------|-------------|
| `requests_per_hour` | `int` | Expected request volume per hour |
| `avg_input_tokens` | `int` | Average input tokens per request |
| `avg_output_tokens` | `int` | Average output tokens per request |

**Returns:**

```python
# {
#   "hourly_cost_usd": 1.24,
#   "daily_cost_usd": 29.76,
#   "monthly_cost_usd": 892.80,
#   "breakdown_by_model": {
#     "gpt-4o-mini": {
#       "requests_pct": 0.45,
#       "cost_per_request_usd": 0.000105,
#       "hourly_cost_usd": 0.047
#     },
#     "claude-sonnet-4-6": {
#       "requests_pct": 0.40,
#       "cost_per_request_usd": 0.002100,
#       "hourly_cost_usd": 0.840
#     },
#     "claude-opus-4-6": {
#       "requests_pct": 0.15,
#       "cost_per_request_usd": 0.013500,
#       "hourly_cost_usd": 0.203
#     }
#   },
#   "optimization_tip": "Routing 10% of simple tasks from claude-sonnet-4-6 to gpt-4o-mini would save ~$4.20/day"
# }
```

| Field | Description |
|-------|-------------|
| `hourly_cost_usd` | Projected USD spend per hour |
| `daily_cost_usd` | Projected USD spend per day |
| `monthly_cost_usd` | Projected USD spend per month |
| `breakdown_by_model` | Per-model cost decomposition |
| `optimization_tip` | Actionable recommendation to reduce costs |

**Use forecasting to:**
- Set `SLAConfig.budget_per_hour_usd` based on realistic projections
- Identify which models dominate cost
- Plan budget before scaling traffic

---

## 📈 Cost Tracking & Analytics

### `log_usage(decision, input_tokens, output_tokens) → float`

Log actual token usage for a completed request. Returns the actual cost in USD.

```python
cost = router.log_usage(decision, input_tokens=500, output_tokens=200)
print(f"Request cost: ${cost:.6f}")
```

### `cost_report(period) → Dict`

Aggregate cost report over a time period.

```python
report = router.cost_report(period="week")    # "day" | "week" | "month"
# {
#   "period": "week",
#   "total_cost_usd": 42.18,
#   "by_model": {
#     "gpt-4o-mini": {"requests": 8420, "cost_usd": 3.14},
#     "claude-sonnet-4-6": {"requests": 3210, "cost_usd": 28.44},
#     "claude-opus-4-6": {"requests": 380, "cost_usd": 10.60}
#   },
#   "avg_cost_per_request_usd": 0.00351
# }
```

### `savings_estimate(comparison_model) → Dict`

Calculate how much was saved by routing vs always using a reference model.

```python
savings = router.savings_estimate(comparison_model="gpt-4o")
# {
#   "comparison_model": "gpt-4o",
#   "actual_cost_usd": 42.18,
#   "comparison_cost_usd": 187.40,
#   "savings_usd": 145.22,
#   "savings_pct": 0.775
# }
```

A `savings_pct` of `0.775` means the router saved 77.5% vs routing every request to `gpt-4o`.

### `routing_analytics() → Dict`

Full aggregate analytics on routing decisions.

```python
analytics = router.routing_analytics()
# {
#   "total_decisions": 12010,
#   "avg_confidence": 0.831,
#   "tier_distribution": {
#     "trivial": 1205, "simple": 3802, "moderate": 4510,
#     "complex": 2101, "expert": 392
#   },
#   "tier_percentages": {
#     "trivial": 10.0, "simple": 31.7, "moderate": 37.6,
#     "complex": 17.5, "expert": 3.3
#   },
#   "model_usage": {
#     "gpt-4o-mini": 5007,
#     "claude-sonnet-4-6": 5902,
#     "claude-opus-4-6": 1101
#   },
#   "provider_usage": {
#     "openai": 5007,
#     "anthropic": 7003
#   },
#   "most_used_model": "claude-sonnet-4-6",
#   "most_used_provider": "anthropic"
# }
```

---

## 🗂️ Model Registry

### `get_model_info(model_name) → ModelInfo`

```python
info = router.get_model_info("claude-sonnet-4-6")

info.name                                     # "claude-sonnet-4-6"
info.provider                                 # "anthropic"
info.cost_per_1k_input                        # 0.003
info.cost_per_1k_output                       # 0.015
info.capabilities                             # ["text", "code", "vision"]
info.max_tokens                               # 200000
info.supports_streaming                       # True
info.has_capability("vision")                 # True → bool
info.calculate_cost(500, 200)                 # → float: cost for 500 input + 200 output tokens
```

### `ModelInfo` Fields

| Field | Type | Description |
|-------|------|-------------|
| `name` | `str` | Model identifier |
| `provider` | `str` | Provider: `"anthropic"`, `"openai"`, etc. |
| `cost_per_1k_input` | `float` | USD per 1,000 input tokens |
| `cost_per_1k_output` | `float` | USD per 1,000 output tokens |
| `capabilities` | `List[str]` | E.g. `["text", "code", "vision"]` |
| `max_tokens` | `int` | Maximum context window in tokens |
| `supports_streaming` | `bool` | Whether model supports streaming responses |

### `list_models_for_tier(tier) → List[Dict]`

List all models eligible for a given tier, ordered by cost.

```python
models = router.list_models_for_tier("moderate")
# [
#   {"name": "gpt-4o-mini", "provider": "openai", "cost": 0.000105, "capabilities": [...], "max_tokens": 128000},
#   {"name": "claude-sonnet-4-6", "provider": "anthropic", "cost": 0.00210, "capabilities": [...], "max_tokens": 200000},
# ]
```

### `save_state(path)`

Persist router state (cost history, health data, analytics) to disk.

```python
router.save_state("./router_state")
```

---

## 🧠 SemanticClassifier (v2.0)

`SemanticClassifier` replaces the built-in keyword classifier with TF-IDF semantic classification. It can be injected into the v1.0 `Router` for semantic accuracy without migrating to `AdaptiveRouter`.

### Usage

```python
from antaris_router import SemanticClassifier, Router

sem = SemanticClassifier(workspace="./routing_data")
router = Router(classifier=sem)

decision = router.route("Design a microservices platform with event-driven architecture")
```

The `SemanticClassifier` persists its TF-IDF model to `workspace/` and improves with each classified prompt. It is the same classifier used internally by `AdaptiveRouter`.

### Constructor

```python
sem = SemanticClassifier(workspace="./routing_data")
```

| Parameter | Type | Description |
|-----------|------|-------------|
| `workspace` | `str` | Directory to persist TF-IDF model and vocabulary |

### How it Works

1. **Tokenization** — Prompt is tokenized and stopwords removed
2. **TF-IDF vectorization** — Term frequency × inverse document frequency weights computed
3. **Tier classification** — Vector compared to learned per-tier centroids
4. **Confidence scoring** — Distance to centroids determines confidence score
5. **Feedback loop** — `report_outcome()` adjusts centroid weights over time

### Injecting into v1.0 Router

```python
sem = SemanticClassifier(workspace="./routing_data")

router = Router(
    classifier=sem,
    low_confidence_threshold=0.6,
    escalation_model="claude-opus-4-6",
    escalation_strategy="always",
)

decision = router.route("Implement OAuth2 with PKCE in a distributed system")
```

This gives you semantic classification accuracy with all v1.0 features (SLA, health tracking, A/B testing, cost tracking).

---

## 📊 QualityTracker (v2.0)

`QualityTracker` stores per-prompt outcome data and model performance history. Used internally by `AdaptiveRouter` and available as a standalone component.

### Usage

```python
from antaris_router import QualityTracker

tracker = QualityTracker("./routing_data")

# Record an outcome
tracker.record_outcome(
    prompt_hash,          # str: from RoutingResult.prompt_hash
    quality_score=0.9,    # float: 0.0–1.0
    success=True,         # bool: did the request succeed
    model="claude-sonnet-4-6",
)

# Query model performance history
history = tracker.get_model_performance("claude-sonnet-4-6")
# {
#   "model": "claude-sonnet-4-6",
#   "avg_quality": 0.87,
#   "success_rate": 0.96,
#   "total_outcomes": 3820,
#   "quality_by_tier": {"simple": 0.91, "moderate": 0.88, "complex": 0.84}
# }
```

### `record_outcome(prompt_hash, quality_score, success, model)`

| Parameter | Type | Description |
|-----------|------|-------------|
| `prompt_hash` | `str` | SHA-256 hash from `RoutingResult.prompt_hash` |
| `quality_score` | `float` | Quality 0.0–1.0. Source: human rating, LLM eval, or downstream metric |
| `success` | `bool` | Whether the request succeeded (True) or errored (False) |
| `model` | `str` | Model name that handled the request |

### `get_model_performance(model) → Dict`

Aggregate quality history for a model across all tracked outcomes.

---

## 🔬 ClassificationResult & Signals

`ClassificationResult` is the raw output of the classifier, accessible via `decision.classification`.

```python
decision = router.route("Build a Redis-backed distributed rate limiter in Go")

clf = decision.classification

clf.tier         # "complex"
clf.confidence   # 0.83
clf.signals      # dict (see below)
```

### `ClassificationResult.signals`

```python
clf.signals = {
    "length": 62,                           # raw character count
    "keyword_matches": {
        "trivial": 0,
        "simple": 1,
        "complex": 2,                       # matched "distributed", "rate limiter"
    },
    "has_code": False,                      # whether prompt contains code
    "code_indicators": 0,                   # count of code-related patterns
    "structural_complexity": 2,             # heuristic complexity score
}
```

| Signal | Type | Description |
|--------|------|-------------|
| `length` | `int` | Raw character count of the prompt |
| `keyword_matches` | `Dict[str, int]` | Per-tier keyword match counts |
| `has_code` | `bool` | Whether prompt contains code blocks or inline code |
| `code_indicators` | `int` | Count of code-related patterns (functions, syntax, etc.) |
| `structural_complexity` | `int` | Heuristic score: nesting, multi-part requests, etc. |

---

## 📚 Full API Reference

### `Router` Methods

| Method | Signature | Description |
|--------|-----------|-------------|
| `route` | `(prompt, context, prefer, min_tier, capability, estimate_tokens, ab_test, prefer_healthy, auto_scale) → RoutingDecision` | Route a prompt to the optimal model |
| `explain` | `(decision: RoutingDecision) → str` | Generate plain-English explanation of a routing decision |
| `log_usage` | `(decision, input_tokens, output_tokens) → float` | Log actual usage, returns cost in USD |
| `cost_report` | `(period: str) → Dict` | Aggregate cost report. Period: `"day"/"week"/"month"` |
| `savings_estimate` | `(comparison_model: str) → Dict` | Cost savings vs always using comparison model |
| `routing_analytics` | `() → Dict` | Full routing analytics (tiers, models, confidence) |
| `get_model_info` | `(model_name: str) → ModelInfo` | Model metadata from registry |
| `list_models_for_tier` | `(tier: str) → List[Dict]` | All eligible models for a tier |
| `save_state` | `(path: str)` | Persist router state to disk |
| `record_provider_event` | `(model, event, latency_ms, details) → None` | Record provider health event |
| `get_provider_health` | `(model: str) → Dict` | Current health status for a model |
| `create_ab_test` | `(name, strategy_a, strategy_b, split) → ABTest` | Create A/B test configuration |
| `forecast_cost` | `(requests_per_hour, avg_input_tokens, avg_output_tokens) → Dict` | Project future costs |
| `get_sla_report` | `(since_hours: float) → Dict` | SLA compliance report |
| `check_budget_alert` | `() → Dict` | Real-time budget status |
| `record_sla_quality` | `(model: str, score: float) → None` | Record quality score for SLA tracking |
| `get_cost_optimizations` | `(estimate_tokens: Tuple[int, int]) → List[Dict]` | Cost optimization suggestions |

### `AdaptiveRouter` Methods

| Method | Signature | Description |
|--------|-----------|-------------|
| `register_model` | `(config: ModelConfig) → None` | Register a model with tier range and costs |
| `route` | `(prompt: str) → RoutingResult` | Classify and route a prompt |
| `report_outcome` | `(prompt_hash, quality_score, success) → None` | Feed outcome back for self-improvement |
| `get_analytics` | `() → Dict` | Session-level routing analytics |

---

## 📦 Complete Exports

```python
from antaris_router import (
    # ── v2.0 API ──────────────────────────────────────────────────────────────
    AdaptiveRouter,       # Semantic, self-improving router
    RoutingResult,        # Result object from AdaptiveRouter.route()
    ModelConfig,          # Model registration config for AdaptiveRouter
    SemanticClassifier,   # TF-IDF classifier (injectable into v1.0 Router)
    SemanticResult,       # Result object from SemanticClassifier
    TFIDFVectorizer,      # Low-level TF-IDF vectorizer
    QualityTracker,       # Outcome feedback tracker
    QualityDecision,      # Quality decision record

    # ── v1.0 API ──────────────────────────────────────────────────────────────
    Router,               # Keyword-based production router
    RoutingDecision,      # Decision object from Router.route()
    TaskClassifier,       # Built-in keyword classifier
    ClassificationResult, # Classification output with signals
    ModelRegistry,        # Internal model registry
    ModelInfo,            # Model metadata object
    CostTracker,          # Cost tracking component
    UsageRecord,          # Per-request usage record
    Config,               # Router configuration

    # ── Sprint 5 — SLA ────────────────────────────────────────────────────────
    SLAConfig,            # SLA constraint configuration
    SLAMonitor,           # SLA enforcement monitor
    SLARecord,            # Per-request SLA record
)
```

---

## 🔄 Migration: v1.0 → v2.0

| Feature | v1.0 `Router` | v2.0 `AdaptiveRouter` |
|---------|--------------|----------------------|
| Classification | Keyword matching | TF-IDF semantic |
| Self-improvement | ❌ | ✅ via `report_outcome()` |
| Persistence | `save_state()` | Automatic to workspace |
| SLA enforcement | ✅ | ❌ (use Router + SemanticClassifier) |
| Provider health | ✅ | ❌ (use Router + SemanticClassifier) |
| A/B testing | ✅ | Built-in `ab_test_rate` |
| Cost tracking | ✅ | Basic (via analytics) |
| Explainability | ✅ `explain()` | Via `result.confidence` + analytics |
| `RoutingDecision` | Full object | Lightweight `RoutingResult` |

**Recommended migration path:**

```python
# Option A: Full v2.0 — new project, accuracy-first
router = AdaptiveRouter("./routing_data")

# Option B: Best of both — semantic accuracy + full v1.0 features
sem = SemanticClassifier(workspace="./routing_data")
router = Router(
    classifier=sem,           # semantic classification
    sla=sla,                  # + SLA enforcement
    enable_cost_tracking=True # + cost tracking
)
```

Option B lets you adopt semantic classification incrementally without losing any v1.0 production features.

---

## 🧩 Advanced Patterns

### Full Production Setup

```python
from antaris_router import Router, SLAConfig, SemanticClassifier

sem = SemanticClassifier(workspace="./routing_data")

sla = SLAConfig(
    max_latency_ms=300,
    budget_per_hour_usd=10.00,
    min_quality_score=0.75,
    auto_escalate_on_breach=True,
)

router = Router(
    classifier=sem,
    sla=sla,
    fallback_chain=["claude-sonnet-4-6", "claude-haiku-3-5", "gpt-4o-mini"],
    low_confidence_threshold=0.6,
    escalation_model="claude-opus-4-6",
    escalation_strategy="always",
    enable_cost_tracking=True,
)

def route_and_call(prompt: str) -> str:
    decision = router.route(
        prompt,
        estimate_tokens=(len(prompt) // 4, 200),
        prefer_healthy=True,
        auto_scale=True,
    )

    # Log decision for audit
    print(router.explain(decision))

    # Call your LLM here
    response = call_llm(decision.model, prompt)

    # Record actual usage
    router.log_usage(decision, input_tokens=len(prompt)//4, output_tokens=len(response)//4)

    # Record provider health
    router.record_provider_event(decision.model, event="success", latency_ms=242.0)

    return response
```

### Periodic Reporting

```python
import time

# Every hour
while True:
    time.sleep(3600)

    report = router.get_sla_report(since_hours=1.0)
    alert = router.check_budget_alert()
    analytics = router.routing_analytics()

    print(f"SLA compliance: {report['compliance_rate']:.1%}")
    print(f"Budget: {alert['status']} ({alert['utilization']:.1%} used)")
    print(f"Most used model: {analytics['most_used_model']}")

    if alert["status"] == "critical":
        # Trigger alerts, adjust SLA config, etc.
        pass

    router.save_state("./router_state")
```

---

## 🏗️ Architecture

```
antaris-router
├── Router (v1.0)                 ← Production keyword-based router
│   ├── TaskClassifier            ← Built-in keyword classification
│   │   └── ClassificationResult  ← With signals: length, keywords, code
│   ├── ModelRegistry             ← Model metadata + capability index
│   ├── CostTracker               ← Per-session/period cost tracking
│   ├── SLAMonitor                ← Constraint enforcement + reporting
│   └── RoutingDecision           ← Full decision object
│
├── AdaptiveRouter (v2.0)         ← Self-improving semantic router
│   ├── SemanticClassifier        ← TF-IDF vectorizer + tier centroids
│   ├── TFIDFVectorizer           ← Low-level TF-IDF implementation
│   ├── QualityTracker            ← Outcome feedback + model performance
│   └── RoutingResult             ← Lightweight result object
│
└── Shared
    ├── SLAConfig                 ← SLA constraint definition
    ├── SLARecord                 ← Per-request SLA record
    └── ModelInfo                 ← Model metadata (costs, caps, streaming)
```

---

## 📄 License

Part of the [antaris-suite](https://github.com/Antaris-Analytics-LLC/antaris-suite) — adaptive AI infrastructure for LLM cost optimization.

© Antaris Analytics LLC
