Metadata-Version: 2.4
Name: speech-mine
Version: 0.1.0
Summary: A powerful tool for extracting and analyzing speech data from audio files with known speaker counts and contents.
Project-URL: Homepage, https://github.com/your-org/speech-mine
Project-URL: Repository, https://github.com/your-org/speech-mine
Project-URL: Issues, https://github.com/your-org/speech-mine/issues
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: faster-whisper>=1.2.0
Requires-Dist: pandas>=2.3.2
Requires-Dist: pyannote-audio>=3.3.2
Requires-Dist: pydub>=0.25.1
Requires-Dist: pytest>=8.4.2
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: rapidfuzz>=3.14.1
Requires-Dist: tqdm>=4.67.1
Description-Content-Type: text/markdown


# speech-mine

Speech diarization and transcript analysis toolkit. Extract speaker-labeled transcripts from audio, format them into readable scripts, search them with fuzzy matching, and pre-process audio with chunking.

## Modules

| Module | Description | Docs |
|--------|-------------|------|
| `extract` | Transcribe audio with speaker diarization | [→](docs/extract.md) |
| `format` | Format CSV transcripts into readable scripts | [→](docs/format.md) |
| `chunk` | Split audio into segments via YAML config | [→](docs/chunk.md) |
| `search` | Fuzzy search transcripts by word or phrase | [→](docs/search.md) |

## Installation

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
git clone <repository-url>
cd speech-mine
uv sync
```

See [docs/installation.md](docs/installation.md) for library dependency setup and HuggingFace token configuration.

## Quick Start

```bash
# 1. Extract a transcript
uv run speech-mine extract interview.mp3 output.csv \
  --hf-token YOUR_TOKEN \
  --num-speakers 2 \
  --compute-type float32

# 2. Format into a readable script
uv run speech-mine format output.csv script.txt

# 3. Search it
uv run speech-mine search "topic of interest" output.csv --pretty
```

## Documentation

```bash
# Serve docs locally
uv run mkdocs serve
```

Or browse the `docs/` folder directly.

## License

TBD
