Metadata-Version: 2.4
Name: kulu-audio-eval
Version: 0.0.3
Summary: Voice agent audio evaluation: cutout detection, latency metrics, and failure inference
License: Proprietary
License-File: LICENSE
Keywords: audio,evaluation,voice-agent,speech,latency
Author: Kulu
Author-email: team@kulu.ai
Requires-Python: >=3.11,<4.0
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: click (>=8.3.0,<9.0.0)
Requires-Dist: elevenlabs (>=2.32.0,<3.0.0)
Requires-Dist: google-genai (>=1.63.0,<2.0.0)
Requires-Dist: librosa (>=0.11.0,<0.12.0)
Requires-Dist: matplotlib (>=3.10.0,<4.0.0)
Requires-Dist: numpy (>=1.26.0)
Requires-Dist: pandas (>=2.3.0,<3.0.0)
Requires-Dist: python-dotenv (>=1.2.0,<2.0.0)
Requires-Dist: rich (>=13.0.0,<14.0.0)
Requires-Dist: seaborn (>=0.13.0,<0.14.0)
Requires-Dist: soundfile (>=0.13.0,<0.14.0)
Requires-Dist: tqdm (>=4.67.0,<5.0.0)
Description-Content-Type: text/markdown

# kulu-audio-eval

Voice agent audio evaluation library. Analyses 2-speaker conversation recordings (agent + user) and returns structured quality metrics.

## Features

- **Cutout detection** — mid-speech energy dropouts (`cutout_mid`) and transcript-level cut-off words (`cutout_word`)
- **Response latency** — per-turn and aggregate stats (mean, median, p95, min, max) from user speech end → agent speech start
- **Failure inference** — Gemini-powered classification: wrong language, repetition, incomplete answer, hallucination, irrelevant response
- **Transcription & translation** — ElevenLabs STT with speaker diarization; optional Gemini translation to English
- **Output files** — CSVs, waveform + latency plots, per-turn charts written to `output_dir`

## Installation

```bash
pip install kulu-audio-eval
```

Requires Python 3.11+. API keys for [ElevenLabs](https://elevenlabs.io) and [Google Gemini](https://ai.google.dev) are needed for STT and inference respectively.

## Usage

### Python API

```python
from kulu_audio_eval import evaluate

result = evaluate(
    audio_path="recording.oga",
    elevenlabs_api_key="elabs_xxx",
    gemini_api_key="AIza...",   # optional — skips failure inference and translation if omitted
    output_dir="out/recording", # optional — defaults to out/<audio-stem>
)

# Cutout report
print(result["cutout_report"]["total_count"])       # 2
print(result["cutout_report"]["cutouts"])           # [{"type": "cutout_mid", "start_ms": 86572, "end_ms": 88652}, ...]

# Latency report
print(result["latency_report"]["mean_ms"])          # 1437.0
print(result["latency_report"]["percentile_95_ms"]) # 2800.0
print(result["latency_report"]["per_turn"])         # [{"turn_number": 2, "latency_ms": 1548}, ...]

# Failure inference
print(result["failure_report"]["count"])            # 3
print(result["failure_report"]["failures"])         # [{"turn": 9, "type": "repetition", "reason": "..."}, ...]

# Output files written to result["output_dir"]
```

### CLI

```bash
# Single file
kulu-eval evaluate recording.oga

# Pass API keys directly (no .env needed)
kulu-eval evaluate recording.oga --elevenlabs-key elabs_xxx --gemini-key AIza...

# Custom output directory
kulu-eval evaluate recording.oga -o results/recording

# All files in in/ directory, 8 parallel workers
kulu-eval evaluate -j 8
```

### Return value shape

```python
{
    "output_dir": "out/recording",
    "cutout_report": {
        "total_count": 2,
        "total_duration_ms": 4144.0,
        "cutouts": [
            {"type": "cutout_mid", "start_ms": 86572.0, "end_ms": 88652.0},
            {"type": "cutout_word", "start_ms": 387640.0, "end_ms": 388180.0},
        ],
    },
    "latency_report": {
        "count": 18,
        "mean_ms": 1437.0,
        "median_ms": 1548.0,
        "min_ms": 20.0,
        "max_ms": 10560.0,
        "std_dev_ms": 2100.0,
        "percentile_25_ms": 400.0,
        "percentile_75_ms": 2200.0,
        "percentile_95_ms": 2800.0,
        "percentile_99_ms": 9000.0,
        "per_turn": [
            {"turn_number": 2, "user_end": 82.3, "agent_start": 83.9, "latency_ms": 1548},
            ...
        ],
    },
    "failure_report": {
        "count": 3,
        "failures": [
            {"turn": 9, "type": "repetition", "reason": "Agent stuck in goodbye loop."},
            ...
        ],
    },
}
```

## Configuration

All options can be passed as keyword arguments to `evaluate()` or set via environment variables / `.env`:

| Parameter | Env var | Default | Description |
|-----------|---------|---------|-------------|
| — | `ELEVENLABS_API_KEY` | — | ElevenLabs API key (required) |
| — | `GEMINI_API_KEY` | — | Gemini API key (optional) |
| `DIARIZER_NUM_SPEAKERS` | same | `2` | Expected number of speakers |
| `CUTOUT_THRESHOLD_MS` | same | `1000` | Min gap duration to count as `cutout_mid` |
| `CUTOUT_SILENT_THRESHOLD_DB` | same | `-40` | dB below 95th-percentile RMS = silence |
| `TRANSLATION_TARGET_LANGUAGE` | same | `en` | Translation target; empty string disables |
| `FAILURE_INFERENCE_USE_CACHE` | same | `true` | Cache Gemini results between runs |

## License

Proprietary — internal use only. Copyright © 2026 Frontier Interactions LTD.

