Metadata-Version: 2.4
Name: thintokens
Version: 1.1.0
Summary: Reduce LLM token costs by 15-87% — intelligent payload compression for OpenAI, Anthropic, and any LLM API
License-Expression: MIT
License-File: LICENSE
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown

# thintokens

Reduce LLM token costs by 15–87% — intelligent payload compression for OpenAI,
Anthropic, Google, and any LLM API. Pass structured data through ThinTokens
before sending it to a language model and pay significantly less per request.

**Zero external dependencies.** Pure Python standard library, Python 3.8+.

---

## Install

```bash
pip install thintokens
```

---

## Setup

```bash
export THINTOKENS_API_URL="https://your-endpoint"
export THINTOKENS_API_KEY="tt_live_..."
```

Or pass credentials directly in code (see examples below).

---

## Quick start

```python
import thintokens
from openai import OpenAI

# Your data — could be a product catalog, user records, log entries, anything
data = [
    {"id": 1, "name": "Alice", "status": "active", "plan": "pro", "revenue": 420.00},
    {"id": 2, "name": "Bob",   "status": "active", "plan": "pro", "revenue": 310.00},
    {"id": 3, "name": "Carol", "status": "trial",  "plan": "free","revenue": 0.00},
    # ... hundreds more rows
]

result = thintokens.thin(
    data,
    api_key="tt_live_...",
    url="https://your-endpoint",
    model="gpt-4o",
)

# result acts as a string — pass it straight into your LLM
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": f"Summarise this customer data:\n{result}"}],
)

print(f"Saved {result.savings_pct}% — ${result.usd_saved:.4f} on this request")
# Saved 43% — $0.0022 on this request
```

---

## `thin()` — compress a payload

```python
thintokens.thin(
    data,
    api_key,
    model       = None,
    url         = None,
    track       = True,
    keep        = None,
    drop        = None,
    truncate    = None,
    precision   = None,
    auto_prune  = False,
    redact      = None,
    cache_align = False,
    auto_route  = False,
    fail_open   = False,
)
```

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `data` | `dict \| list \| str \| bytes` | required | Payload to compress. Dicts and lists are serialised automatically. |
| `api_key` | `str` | required | Your ThinTokens API key (`tt_live_...`). |
| `model` | `str` | `None` | LLM model name for accurate USD savings. See [supported models](#supported-models). Omit to use a generic $1.00/1M rate. |
| `url` | `str` | `None` | API endpoint. Falls back to `THINTOKENS_API_URL` env var. |
| `track` | `bool` | `True` | Write this request's savings to the local ledger (`~/.thintokens/ledger.json`). |
| `keep` | `str` | `None` | Dot-notation paths to keep — everything else is dropped. Example: `"user.id,user.name,orders"`. |
| `drop` | `str` | `None` | Dot-notation paths to drop. Example: `"description,audit_log,metadata"`. |
| `truncate` | `int` | `None` | Truncate all strings to this many characters. |
| `precision` | `int` | `None` | Round all floats to N decimal places. `2` turns `3.14159` → `3.14`. |
| `auto_prune` | `bool` | `False` | Detect and drop near-constant fields across array rows (entropy analysis). Safe for homogeneous arrays — product catalogs, log records, CRM data. |
| `redact` | `str \| list` | `None` | Replace PII with stable tokens before the data leaves your system. Pass `"all"` or a list: `["email", "phone", "ssn", "cc", "ip"]`. |
| `cache_align` | `bool` | `False` | Reorder object keys by stability so LLM provider prompt-caches share the longest possible prefix. Boosts cache hit rates on repeated similar payloads. |
| `auto_route` | `bool` | `False` | Let the server scan the payload and intelligently decide which engines to activate. Detects PII, array structure, and ROI automatically. Explicit parameters always override router decisions. |
| `fail_open` | `bool` | `False` | If `True`, silently return the original payload when the API is unreachable or returns a 5xx error — your app keeps working with no savings. Auth errors (401) and bad requests (400) still raise. Recommended for production. |

### Returns

A `ThinResult` object. `str(result)` or `f"{result}"` gives the compressed text.

### Raises

`ThinTokensError` if the API returns a non-200 status (unless `fail_open=True`
absorbs server errors), or if no URL is configured.

---

## `ThinResult`

| Field | Type | Description |
|-------|------|-------------|
| `text` | `str` | Compressed output. `str(result)` returns this. Contains PII tokens if `redact=` was used. |
| `content` | `str` *(property)* | Like `text`, but PII tokens are automatically restored. Identical to `text` when `redact=` is not used. |
| `model` | `str` | Model used for pricing. |
| `tokens_in` | `int` | Estimated token count of the original input. |
| `tokens_out` | `int` | Estimated token count of the compressed output. |
| `tokens_saved` | `int` | Tokens eliminated this request. |
| `savings_pct` | `int` | Percentage saved, e.g. `43`. |
| `usd_saved` | `float` | Dollar amount saved for this request. |
| `chunks` | `int` | API calls made. Always `1` unless auto-chunking fired. |
| `auto_pruned_fields` | `list[str]` | Fields dropped by entropy pruning. Empty when `auto_prune=False`. |
| `redacted_count` | `int` | Number of unique PII values replaced. `0` when `redact=` not used. |
| `redaction_map` | `dict` | Maps token → original value, e.g. `{"[E1]": "alice@example.com"}`. |
| `cache_aligned_keys` | `int` | Number of keys reordered for cache alignment. `0` when `cache_align=False`. |
| `route_info` | `dict` | What the semantic router activated, e.g. `{"mode": "auto", "roi": 0.34, "pii": ["email"]}`. |

### `.content` — automatic PII restore

When you use `redact=`, the compressed text contains tokens like `[E1]`, `[PH1]`.
Use `.content` to get the text with original values restored automatically:

```python
result = thintokens.thin(data, api_key=..., url=..., redact=["email", "phone"])

# Send tokens to the LLM — real PII never leaves your system
reply = call_llm(str(result))       # LLM sees [E1], [PH1]

# Get the reply with real values restored
clean_reply = result.rehydrate(reply)

# Or just read result.content for the compressed text with values already restored
print(result.content)               # alice@example.com, 555-1234 visible
```

### `.rehydrate(text)` — restore PII in any string

```python
result = thintokens.thin(data, api_key=..., url=..., redact="all")
llm_output = call_llm(str(result))
restored = result.rehydrate(llm_output)
```

---

## Feature examples

### Drop or keep fields

```python
# Keep only what the LLM needs
result = thintokens.thin(data, api_key=..., url=...,
    keep="user.id,user.name,orders.status,orders.total")

# Drop fields the LLM doesn't need
result = thintokens.thin(data, api_key=..., url=...,
    drop="audit_log,description,created_at,metadata")
```

### Entropy pruning — drop near-constant fields automatically

```python
# Great for homogeneous arrays where many fields have the same value across rows
result = thintokens.thin(data, api_key=..., url=..., auto_prune=True)
print(result.auto_pruned_fields)  # ["currency", "region", "source_system"]
```

### PII redaction

```python
result = thintokens.thin(
    customer_records,
    api_key=..., url=...,
    redact=["email", "phone"],   # or redact="all"
)

# result.text / str(result) → compressed with [E1], [PH1] tokens
# result.content             → compressed with original values restored
# result.redaction_map       → {"[E1]": "alice@example.com", "[PH1]": "555-1234"}

llm_answer = call_llm(str(result))           # LLM never sees real PII
final      = result.rehydrate(llm_answer)    # restore values in the LLM's reply
```

### Cache alignment — boost LLM prompt-cache hits

```python
# Reorders keys so stable fields (status, region) come before volatile ones (id, timestamp)
# → every row starts with the same prefix → LLM provider caches more aggressively
result = thintokens.thin(data, api_key=..., url=..., cache_align=True)
print(result.cache_aligned_keys)  # 7 keys reordered
```

### Auto-routing — let the server decide

```python
# Server scans the payload and activates PII, pruning, and alignment as needed
result = thintokens.thin(data, api_key=..., url=..., auto_route=True)
print(result.route_info)
# {"mode": "auto", "roi": 0.34, "pii": ["email", "phone"], "auto_prune": True, "cache_align": True}
```

### Fail-open for production

```python
# If ThinTokens is unreachable, your app continues with the original data
result = thintokens.thin(data, api_key=..., url=..., fail_open=True)

if result.route_info.get("mode") == "fallback":
    print("ThinTokens unavailable — running without compression")

# str(result) still works in both cases — your LLM call is unaffected
call_llm(str(result))
```

### Combine features

```python
result = thintokens.thin(
    data,
    api_key  = "tt_live_...",
    url      = "https://your-endpoint",
    model    = "claude-3.5-sonnet",
    drop     = "audit_log,raw_payload",
    auto_prune  = True,
    redact      = ["email", "phone"],
    cache_align = True,
    fail_open   = True,
)
```

---

## `get_total_savings()` — cumulative ledger

```python
thintokens.get_total_savings(ledger_path=None)
```

Returns all-time savings from the local ledger across every `thin()` call made
with `track=True` (the default).

```python
summary = thintokens.get_total_savings()
print(summary)
# ThinTokens All-Time Savings
#   Requests      : 142
#   Tokens saved  : 1,204,800
#   USD saved     : $3.0120
#   Since         : 2025-01-15T08:00:00Z
#   Last request  : 2025-05-04T14:22:10Z
```

| Field | Type | Description |
|-------|------|-------------|
| `total_tokens_saved` | `int` | All-time tokens saved. |
| `total_usd_saved` | `float` | All-time USD saved. |
| `request_count` | `int` | Total tracked requests. |
| `first_request` | `str \| None` | ISO 8601 timestamp of first tracked request. |
| `last_request` | `str \| None` | ISO 8601 timestamp of most recent request. |

---

## Auto-chunking

Payloads larger than 8 MB are automatically split into safe-sized batches,
sent one at a time, and the results are merged into a single `ThinResult`.
No extra code required. The `chunks` field tells you how many API calls ran.

```python
result = thintokens.thin(huge_list, api_key=..., url=...)
print(result.chunks)   # e.g. 3 — split into 3 calls, merged transparently
```

---

## `ThinTokensError`

Raised on non-200 API responses or missing URL configuration.

```python
try:
    result = thintokens.thin(data, api_key="bad_key", url="...")
except thintokens.ThinTokensError as e:
    print(e.status)  # 401
    print(str(e))    # "HTTP 401: Unauthorized"
```

| Code | Meaning |
|------|---------|
| `0`   | No API URL configured. Set `THINTOKENS_API_URL` or pass `url=`. |
| `400` | Empty body or invalid JSON. |
| `401` | Invalid or missing API key. |
| `413` | Payload too large to auto-chunk (no top-level array). Use `drop=` or split manually. |
| `429` | Rate limit exceeded (100 requests/minute). |
| `500` | Server error — use `fail_open=True` to absorb this in production. |

---

## Supported models

Pass the model name (or any substring) to `model=` for accurate USD savings.

| Provider | Examples |
|----------|---------|
| OpenAI | `gpt-4o`, `gpt-4o-mini`, `o1`, `o3`, `o3-mini` |
| Anthropic | `claude-opus-4`, `claude-3.5-sonnet`, `claude-3-haiku`, `claude-haiku-4` |
| Google | `gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-1.5-pro`, `gemini-1.5-flash` |
| Meta | `llama-3.1-405b`, `llama-3.1-70b`, `llama-3.1-8b` |
| Alibaba | `qwen-max`, `qwen-plus`, `qwen-turbo` |

Omit `model` to fall back to a generic **$1.00 / 1M token** rate.

---

## Environment variables

| Variable | Description |
|----------|-------------|
| `THINTOKENS_API_URL` | API endpoint URL. Required if `url=` is not passed to `thin()`. |
| `THINTOKENS_LEDGER_PATH` | Override ledger file location. Default: `~/.thintokens/ledger.json`. |
