Metadata-Version: 2.4
Name: hermitcrab
Version: 0.1.0
Summary: Track and analyze LLM token usage across providers with local billing ledger and OpenAI-compatible routing
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: fastapi<1.0.0,>=0.115.0
Requires-Dist: httpx<1.0.0,>=0.27.0
Requires-Dist: pydantic<3.0.0,>=2.8.0
Requires-Dist: pydantic-settings<3.0.0,>=2.3.0
Requires-Dist: uvicorn<1.0.0,>=0.30.0

# Hermit LLM Proxy

Hermit is an OpenAI-compatible proxy that accepts client requests, ignores the client-supplied `model`, and routes every call using proxy-side configuration.

It also records detailed token traffic so you can analyze:

- prompt tokens by request and provider
- completion tokens by request and provider
- total tokens over time buckets
- latency and request status

## Features

- OpenAI-compatible `POST /v1/chat/completions`
- Server-side model enforcement through an active saved profile
- Upstream routing to OpenAI-compatible providers such as OpenAI, OpenRouter, DeepSeek, and Ollama
- SQLite traffic ledger with per-request token accounting
- Analytics endpoints for totals and time series
- Non-streaming proxy mode for the initial implementation
- Curses TUI for editing providers, models, base URLs, and secrets
- Command-line profile management and token traffic reporting
- Persistent CLI-configurable log level for proxy runtime output
- Persistent profile history so previously used models can be re-activated or deleted

Supported provider names:

- `openai`
- `openrouter`
- `anthropic` for OpenAI-compatible gateways only
- `deepseek`
- `ollama`
- `vllm`
- `lmstudio`
- `google-gemini` or `gemini`
- `xai-grok`, `xai`, or `grok`
- `alibaba-qwen` or `qwen`
- `minimax`
- `glm`
- `github-models`
- `github-copilot`
- `copilot-proxy`
- `xiaomi` or `mimo`
- `amazon-bedrock` or `bedrock`
- `nvidia`
- `huggingface` or `hugging-face`
- `openai-compatible`

Profile validation behavior:

- Saving or updating a profile runs a quick upstream chat-completions probe by default.
- Activating a profile from the API, CLI, or TUI also runs the quick test before switching.
- The probe uses a tiny non-streaming request against the configured provider/model to catch bad base URLs, API keys, model names, or incompatible endpoints early.
- CLI and API support `skip_test` / `--skip-test` when you need to save a profile without a live upstream check.

OpenAI-compatible mode:

- Profiles can enable `allow_client_model_override`, which makes Hermit forward the caller's `model` value instead of forcing the saved profile model.
- When this is disabled, Hermit keeps the original server-enforced model behavior.
- Hermit now exposes `POST /v1/chat/completions`, `POST /v1/embeddings`, and `GET /v1/models`.
- Streaming chat completions are relayed as Server-Sent Events and Hermit tries to capture stream usage when the upstream emits usage chunks.

## Configuration

Environment variables:

- `HERMIT_DATABASE_URL` default: `sqlite:///./hermit.db`
- `HERMIT_DEFAULT_PROVIDER` default: `ollama`
- `HERMIT_DEFAULT_MODEL` default: `deepseek-r1:32b`
- `HERMIT_DEFAULT_BASE_URL` default: `http://localhost:11434/v1`
- `HERMIT_DEFAULT_API_KEY` default: empty
- `HERMIT_DEFAULT_MAX_TOKENS` default: `2048`
- `HERMIT_REQUEST_TIMEOUT_SECONDS` default: `120`

Runtime log levels:

- Hermit stores a persistent default log level in SQLite app state.
- Supported values: `critical`, `error`, `warning`, `info`, `debug`
- `hermit-proxy server` uses the persisted value unless you pass `--log-level` for a one-off override.

Optional provider-specific overrides:

- `HERMIT_OPENAI_BASE_URL`
- `HERMIT_OPENAI_API_KEY`
- `HERMIT_OPENAI_MODEL`
- `HERMIT_OPENROUTER_BASE_URL`
- `HERMIT_OPENROUTER_API_KEY`
- `HERMIT_OPENROUTER_MODEL`
- `HERMIT_DEEPSEEK_BASE_URL`
- `HERMIT_DEEPSEEK_API_KEY`
- `HERMIT_DEEPSEEK_MODEL`
- `HERMIT_OLLAMA_BASE_URL`
- `HERMIT_OLLAMA_API_KEY`
- `HERMIT_OLLAMA_MODEL`
- `HERMIT_VLLM_BASE_URL`
- `HERMIT_VLLM_API_KEY`
- `HERMIT_VLLM_MODEL`
- `HERMIT_LMSTUDIO_BASE_URL`
- `HERMIT_LMSTUDIO_API_KEY`
- `HERMIT_LMSTUDIO_MODEL`
- `HERMIT_GOOGLE_GEMINI_BASE_URL`
- `HERMIT_GOOGLE_GEMINI_API_KEY`
- `HERMIT_GOOGLE_GEMINI_MODEL`
- `HERMIT_XAI_GROK_BASE_URL`
- `HERMIT_XAI_GROK_API_KEY`
- `HERMIT_XAI_GROK_MODEL`
- `HERMIT_ALIBABA_QWEN_BASE_URL`
- `HERMIT_ALIBABA_QWEN_API_KEY`
- `HERMIT_ALIBABA_QWEN_MODEL`
- `HERMIT_MINIMAX_BASE_URL`
- `HERMIT_MINIMAX_API_KEY`
- `HERMIT_MINIMAX_MODEL`
- `HERMIT_GLM_BASE_URL`
- `HERMIT_GLM_API_KEY`
- `HERMIT_GLM_MODEL`
- `HERMIT_GITHUB_MODELS_BASE_URL`
- `HERMIT_GITHUB_MODELS_API_KEY`
- `HERMIT_GITHUB_MODELS_MODEL`
- `HERMIT_GITHUB_COPILOT_BASE_URL`
- `HERMIT_GITHUB_COPILOT_API_KEY`
- `HERMIT_GITHUB_COPILOT_MODEL`
- `HERMIT_COPILOT_PROXY_BASE_URL`
- `HERMIT_COPILOT_PROXY_API_KEY`
- `HERMIT_COPILOT_PROXY_MODEL`
- `HERMIT_OPENAI_COMPATIBLE_BASE_URL`
- `HERMIT_OPENAI_COMPATIBLE_API_KEY`
- `HERMIT_OPENAI_COMPATIBLE_MODEL`
- `HERMIT_XIAOMI_BASE_URL`
- `HERMIT_XIAOMI_API_KEY`
- `HERMIT_XIAOMI_MODEL`
- `HERMIT_AMAZON_BEDROCK_BASE_URL`
- `HERMIT_AMAZON_BEDROCK_API_KEY`
- `HERMIT_AMAZON_BEDROCK_MODEL`
- `HERMIT_NVIDIA_BASE_URL`
- `HERMIT_NVIDIA_API_KEY`
- `HERMIT_NVIDIA_MODEL`
- `HERMIT_HUGGINGFACE_BASE_URL`
- `HERMIT_HUGGINGFACE_API_KEY`
- `HERMIT_HUGGINGFACE_MODEL`

On first run, Hermit creates a bootstrap saved profile from the default env values. After that, the proxy uses the active saved profile in the SQLite database. The incoming `model` value is stored only for audit/debug visibility.

Database behavior:

- Every CLI command, the TUI, and the HTTP server read and write the SQLite database selected by `HERMIT_DATABASE_URL`.
- If `HERMIT_DATABASE_URL` is not set, Hermit uses `sqlite:///./hermit.db` in the current working directory.
- If you want multiple independent environments, point each one at a different database file.
- The active profile, request ledger, auth profiles, and app config such as log level are all stored there.
- Changing the active profile is a hot switch. New requests use the new model immediately without restarting the server.

Default base URLs by provider:

- `openai`: `https://api.openai.com/v1`
- `openrouter`: `https://openrouter.ai/api/v1`
- `deepseek`: `https://api.deepseek.com/v1`
- `ollama`: `http://localhost:11434/v1`
- `vllm`: `http://localhost:8000/v1`
- `lmstudio`: `http://localhost:1234/v1`
- `google-gemini`: `https://generativelanguage.googleapis.com/v1beta/openai`
- `xai-grok`: `https://api.x.ai/v1`
- `alibaba-qwen`: `https://dashscope.aliyuncs.com/compatible-mode/v1`
- `minimax`: `https://api.minimax.io/v1`
- `glm`: `https://api.z.ai/api/paas/v4`
- `github-models`: `https://models.github.ai`
- `github-copilot`: `https://api.githubcopilot.com`
- `copilot-proxy`: `http://127.0.0.1:4141/v1`
- `xiaomi`: `https://api.xiaomimimo.com/v1`
- `amazon-bedrock`: `https://bedrock-mantle.us-east-1.api.aws/v1`
- `nvidia`: `https://integrate.api.nvidia.com/v1`
- `huggingface`: `https://router.huggingface.co/v1`
- `openai-compatible`: `http://localhost:8000/v1`

Secrets are currently stored in SQLite as plain text. If that is not acceptable for your environment, the next step is to add OS keychain integration or encrypted-at-rest secret storage.

GitHub Copilot notes:

- `github-models` uses the public GitHub Models inference API.
- `github-copilot` now uses a separate auth-profile path and exchanges a GitHub token for a Copilot access token at runtime.
- `copilot-proxy` is a separate local-bridge provider for setups that expose a Copilot-compatible `/v1` endpoint on localhost.
- Native device login is supported, but you must provide your own GitHub OAuth client ID via `--client-id` or `GITHUB_OAUTH_CLIENT_ID`.

## Run

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
hermit-proxy server --reload
```

### TUI

```bash
hermit-proxy tui
```

Keys:

- `Tab` switch between Profiles and Analytics
- `Enter` activate the selected saved profile
- `a` add a saved profile
- `e` edit a saved profile
- `d` delete a saved profile
- `r` refresh the dashboard
- `q` quit

### CLI

Quick operator flow:

```bash
HERMIT_DATABASE_URL=sqlite:///./hermit.db hermit-proxy profiles list
HERMIT_DATABASE_URL=sqlite:///./hermit.db hermit-proxy profiles current
HERMIT_DATABASE_URL=sqlite:///./hermit.db hermit-proxy usage
HERMIT_DATABASE_URL=sqlite:///./hermit.db hermit-proxy report --view recent --requests-limit 20
```

Use `usage` when you only need total up/down tokens. Use `report` when you need per-model, time-series, or recent-request detail.

List saved profiles:

```bash
hermit-proxy profiles list
```

List auth profiles:

```bash
hermit-proxy auth list
```

Store a GitHub token for native `github-copilot`:

```bash
hermit-proxy auth login-github-copilot \
  --name work-copilot \
  --github-token YOUR_GITHUB_TOKEN
```

Start a native GitHub device-login flow for `github-copilot`:

```bash
hermit-proxy auth login-github-copilot \
  --name work-copilot \
  --device \
  --client-id YOUR_GITHUB_OAUTH_CLIENT_ID
```

Add and activate a profile:

```bash
hermit-proxy profiles add \
  --name openrouter-sonnet \
  --provider openrouter \
  --base-url https://openrouter.ai/api/v1 \
  --model anthropic/claude-sonnet-4 \
  --api-key YOUR_KEY \
  --allow-client-model-override \
  --activate
```

Hot-switch to another saved model without restarting the proxy:

```bash
hermit-proxy profiles activate openrouter-sonnet
hermit-proxy profiles activate openrouter-sonnet --skip-test
```

`profiles activate` changes the active model for new requests immediately. In-flight requests keep using the old upstream connection.

Inspect or change the default proxy log level:

```bash
hermit-proxy config get log-level
hermit-proxy config set log-level debug
hermit-proxy server --log-level warning
```

Examples for newly supported providers:

```bash
hermit-proxy profiles add \
  --name local-vllm \
  --provider vllm \
  --base-url http://localhost:8000/v1 \
  --model meta-llama/Llama-3.3-70B-Instruct \
  --activate

hermit-proxy profiles add \
  --name lmstudio-local \
  --provider lmstudio \
  --base-url http://localhost:1234/v1 \
  --model qwen2.5-7b-instruct

hermit-proxy profiles add \
  --name gemini-flash \
  --provider google-gemini \
  --base-url https://generativelanguage.googleapis.com/v1beta/openai \
  --model gemini-2.5-flash \
  --api-key YOUR_KEY

hermit-proxy profiles add \
  --name grok-fast \
  --provider xai-grok \
  --base-url https://api.x.ai/v1 \
  --model grok-4-fast-reasoning \
  --api-key YOUR_KEY

hermit-proxy profiles add \
  --name qwen-plus \
  --provider alibaba-qwen \
  --base-url https://dashscope.aliyuncs.com/compatible-mode/v1 \
  --model qwen-plus \
  --api-key YOUR_KEY

hermit-proxy profiles add \
  --name glm-4.5 \
  --provider glm \
  --base-url https://api.z.ai/api/paas/v4 \
  --model glm-4.5 \
  --api-key YOUR_KEY

hermit-proxy profiles add \
  --name xiaomi-mimo \
  --provider xiaomi \
  --base-url https://api.xiaomimimo.com/v1 \
  --model mimo-v2-flash \
  --api-key YOUR_KEY

hermit-proxy profiles add \
  --name bedrock-mantle \
  --provider amazon-bedrock \
  --base-url https://bedrock-mantle.us-east-1.api.aws/v1 \
  --model us.amazon.nova-lite-v1:0 \
  --api-key YOUR_KEY

hermit-proxy profiles add \
  --name nvidia-hosted \
  --provider nvidia \
  --base-url https://integrate.api.nvidia.com/v1 \
  --model nvidia/llama-3.3-nemotron-super-49b-v1 \
  --api-key YOUR_KEY

hermit-proxy profiles add \
  --name hf-router \
  --provider huggingface \
  --base-url https://router.huggingface.co/v1 \
  --model meta-llama/Meta-Llama-3.1-8B-Instruct \
  --api-key YOUR_KEY

hermit-proxy profiles add \
  --name github-models-gpt41 \
  --provider github-models \
  --base-url https://models.github.ai \
  --model openai/gpt-4.1 \
  --api-key YOUR_GITHUB_TOKEN

hermit-proxy profiles add \
  --name github-copilot-native \
  --provider github-copilot \
  --base-url https://api.githubcopilot.com \
  --model gpt-4.1 \
  --auth-profile-id 1

hermit-proxy profiles add \
  --name local-copilot-proxy \
  --provider copilot-proxy \
  --base-url http://127.0.0.1:4141/v1 \
  --model claude-sonnet-4

hermit-proxy profiles add \
  --name generic-openai-compat \
  --provider openai-compatible \
  --base-url http://localhost:4000/v1 \
  --model default-model \
  --api-key YOUR_KEY \
  --allow-client-model-override
```

Switch to a previously saved profile by id or exact name:

```bash
hermit-proxy profiles activate 2
hermit-proxy profiles activate openrouter-sonnet
```

Update or delete a saved profile:

```bash
hermit-proxy profiles update 2 \
  --name openrouter-sonnet \
  --provider openrouter \
  --base-url https://openrouter.ai/api/v1 \
  --model anthropic/claude-sonnet-4 \
  --api-key NEW_KEY

hermit-proxy profiles delete 2
```

Skip the automatic quick test if needed:

```bash
hermit-proxy profiles add \
  --name staged-profile \
  --provider ollama \
  --base-url http://localhost:11434/v1 \
  --model deepseek-r1:32b \
  --default-max-tokens 2048 \
  --skip-test
```

Show the current active profile:

```bash
hermit-proxy profiles current
```

Run a quick test without saving:

```bash
hermit-proxy profiles test --saved openrouter-sonnet

hermit-proxy profiles test \
  --name temp-test \
  --provider google-gemini \
  --base-url https://generativelanguage.googleapis.com/v1beta/openai \
  --model gemini-2.5-flash \
  --api-key YOUR_KEY
```

Print token analysis reports:

```bash
hermit-proxy usage
hermit-proxy report
hermit-proxy report --view summary
hermit-proxy report --view by-model
hermit-proxy report --view timeseries --bucket day --since 2026-04-01T00:00:00
hermit-proxy report --view recent --requests-limit 20
```

`hermit-proxy usage` prints the shortest totals view: `up`, `down`, `total`, `requests`, and `successful`.

Examples:

- `hermit-proxy usage`
  Quick total prompt/completion tokens from the current database.
- `hermit-proxy report --view summary`
  One-line totals with average latency.
- `hermit-proxy report --view by-model`
  Aggregate token usage by provider and upstream model.
- `hermit-proxy report --view recent --requests-limit 20`
  Inspect the most recent request rows and verify that a model switch took effect.
- `hermit-proxy report --view timeseries --bucket hour`
  Analyze traffic volume over time.

Use `hermit-proxy report --help` for a full explanation of each token-analysis view.

## API

### Proxy request

`POST /v1/chat/completions`

This accepts an OpenAI-style payload and forwards it upstream. When `allow_client_model_override` is disabled on the active profile, Hermit forces the saved profile model. When it is enabled, Hermit forwards the caller's `model`.

Direct Anthropic-native request translation is not implemented yet. The current upstream client is built for OpenAI-compatible provider APIs.

`github-copilot` support in Hermit uses GitHub's public Models inference API shape and headers. It does not integrate with the private Copilot chat service used inside GitHub clients.

### Analytics

- `GET /health`
- `GET /analytics/summary`
- `GET /analytics/by-model`
- `GET /analytics/timeseries?bucket=hour&since=2026-04-01T00:00:00`
- `GET /analytics/requests?limit=100`

### Profile management

- `GET /profiles`
- `POST /profiles`
- `POST /profiles/test`
- `PUT /profiles/{profile_id}`
- `POST /profiles/{profile_id}/activate`
- `DELETE /profiles/{profile_id}`
- `GET /auth-profiles`
- `POST /auth-profiles`
- `POST /auth-profiles/github-copilot/device/start`
- `POST /auth-profiles/github-copilot/device/complete`

### OpenAI-compatible endpoints

- `POST /v1/chat/completions`
- `POST /v1/embeddings`
- `GET /v1/models`
