Metadata-Version: 2.4
Name: caphound-sdk
Version: 0.2.0
Summary: CapHound SDK — LLM cost visibility gateway client for Python
License-Expression: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: httpx>=0.25.0
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.30.0; extra == "anthropic"
Provides-Extra: gemini
Requires-Dist: google-generativeai>=0.5.0; extra == "gemini"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Requires-Dist: openai>=1.0.0; extra == "dev"
Requires-Dist: anthropic>=0.30.0; extra == "dev"

# caphound-sdk

**The control layer between your code and every LLM.**

Warden intercepts, attributes, and governs every AI API call in your stack — with one line of code.

- **Cost attribution** — know exactly what every feature, customer, and team spends on AI
- **Zero migration** — drop-in replacement for OpenAI, Anthropic, and Gemini SDKs
- **Production-grade reliability** — built-in circuit breaker, automatic failover, zero downtime
- **Real-time control** — budgets, policies, and guardrails from a single dashboard

```bash
pip install caphound-sdk
```

---

## 10-Second Setup

```python
# Before
from openai import OpenAI
client = OpenAI()

# After
from openai import OpenAI
from caphound_sdk import wrap_openai

client = wrap_openai(OpenAI(),
    api_key="caphound_live_...",
    gateway_url="https://api.caphound.ai",
)
```

That's it. Every call now flows through Warden. Same API. Same types. Same behavior.

---

## Drop-in Replacement

Warden wraps the official SDKs you already use. No new interfaces. No abstraction layers. No migration.

```python
# Your existing code doesn't change
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
```

The response is identical. The types are identical. The only difference: Warden now tracks cost, latency, and attribution — automatically.

---

## Multi-Provider Support

### OpenAI

```python
from openai import OpenAI
from caphound_sdk import wrap_openai

client = wrap_openai(OpenAI(),
    api_key="caphound_live_...",
    gateway_url="https://api.caphound.ai",
    feature="chat",
)
```

### Anthropic

```python
from anthropic import Anthropic
from caphound_sdk import wrap_anthropic

client = wrap_anthropic(Anthropic(),
    api_key="caphound_live_...",
    gateway_url="https://api.caphound.ai",
    feature="chat",
)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
```

### Google Gemini

```python
from google.generativeai import GenerativeAI
from caphound_sdk import wrap_gemini

genai = wrap_gemini(GenerativeAI(api_key="..."),
    api_key="caphound_live_...",
    gateway_url="https://api.caphound.ai",
    feature="chat",
)

model = genai.GenerativeModel("gemini-2.0-flash")
response = model.generate_content("Hello!")
```

### Direct Client

```python
from caphound_sdk import CapHoundClient, ChatCompletionOptions

warden = CapHoundClient(
    api_key="caphound_live_...",
    gateway_url="https://api.caphound.ai",
)

response = warden.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    options=ChatCompletionOptions(
        feature="chat",
        customer_id="cust_123",
        team="growth",
    ),
)

# Async support built-in
response = await warden.chat.completions.acreate(...)
```

---

## Attribution

Tag every request. Know exactly where your AI spend goes.

```python
from caphound_sdk import caphound_feature, caphound_customer_id

with caphound_feature("search"), caphound_customer_id("cust_456"):
    # All calls in this block are tagged automatically
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Find products..."}],
    )
```

| Tag | Required | Description |
|-----|----------|-------------|
| `feature` | Yes | Product area — `"chat"`, `"search"`, `"agents"` |
| `customer_id` | No | Per-customer cost tracking |
| `team` | No | Engineering team responsible |
| `version` | No | Application version for spend-by-release |

Untagged requests are recorded as `"untagged"`. Requests are never blocked.

---

## Reliability

Warden is designed to be invisible when things go wrong.

**Circuit breaker** — if the gateway is unreachable, the SDK calls the LLM provider directly. Your application never sees an error.

| Parameter | Default |
|-----------|---------|
| Failure threshold | 3 consecutive failures in 10s |
| Recovery probe | After 30 seconds |
| Local event queue | Up to 500 events |
| Recovery | Automatic flush on reconnect |

The SDK fails open. Always. Your LLM calls never depend on Warden being available.

---

## Architecture

```
Your Application
      ↓
  CapHound SDK          ← attribution tags injected here
      ↓
  CapHound Gateway      ← cost calculated, policies evaluated
      ↓
  LLM Provider        ← OpenAI / Anthropic / Gemini
      ↓
  Control Center      ← real-time dashboards, alerts, controls
```

Warden never stores prompts or responses. Only metadata flows through the control layer.

---

## Why Warden Exists

AI costs are exploding. Teams have no idea which features, customers, or models are driving spend — until the invoice arrives.

- **No attribution** — you can't optimize what you can't see
- **No control** — a single runaway feature can burn through your budget overnight
- **No governance** — finance asks "what are we spending on AI?" and engineering guesses

Warden fixes this at the infrastructure level, not with dashboards bolted on after the fact.

---

## Control Center

Every request tracked through the SDK appears in the [Warden Control Center](https://control.caphound.ai) within 30 seconds.

- Cost breakdown by feature, customer, team, model
- Budget utilization and alerts
- Anomaly detection
- Coming: policy enforcement, spend limits, model governance

---

## Configuration

```python
from caphound_sdk import CapHoundClient

client = CapHoundClient(
    api_key="caphound_live_...",           # Required
    gateway_url="https://api.caphound.ai",  # Required
    debug=False,                          # Optional — enable debug logging
    circuit_breaker={                      # Optional — override defaults
        "failure_threshold": 3,
        "window_seconds": 10,
        "recovery_timeout_seconds": 30,
        "max_queue_size": 500,
    },
)
```

---

## Roadmap

Warden is building the complete AI control layer.

| Phase | Status | What it does |
|-------|--------|-------------|
| **Visibility** | Live | Cost tracking, attribution, dashboards |
| **Control** | Building | Budget enforcement, policy engine, guardrails |
| **Optimization** | Planned | Smart model routing, auto-pacing, cost-aware decisions |

The SDK you install today gets smarter with every release. No migration required.

---

## Requirements

- Python 3.10+
- `httpx` (installed automatically)

## Links

**[caphound.ai](https://caphound.ai)** — Product
**[control.caphound.ai](https://control.caphound.ai)** — Control Center
**[GitHub](https://github.com/caphound/caphound)** — Source
