Metadata-Version: 2.4
Name: intellema-vdk
Version: 0.2.2
Summary: A Voice Development Kit for different Voice Agent Platforms
Author: Intellema
License: MIT License
        
        Copyright (c) 2026 Intellema
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: requests>=2.31.0
Requires-Dist: httpx>=0.24.0
Provides-Extra: livekit
Requires-Dist: livekit-api>=1.1.0; extra == "livekit"
Requires-Dist: boto3>=1.28.0; extra == "livekit"
Provides-Extra: retell
Requires-Dist: retell-sdk>=2.0.0; extra == "retell"
Requires-Dist: twilio>=8.0.0; extra == "retell"
Requires-Dist: boto3>=1.28.0; extra == "retell"
Provides-Extra: stt
Requires-Dist: openai>=1.0.0; extra == "stt"
Provides-Extra: tts
Requires-Dist: together>=1.0.0; extra == "tts"
Requires-Dist: openai>=1.0.0; extra == "tts"
Provides-Extra: audio
Requires-Dist: pyaudio>=0.2.13; extra == "audio"
Provides-Extra: all
Requires-Dist: livekit-api>=1.1.0; extra == "all"
Requires-Dist: retell-sdk>=2.0.0; extra == "all"
Requires-Dist: twilio>=8.0.0; extra == "all"
Requires-Dist: boto3>=1.28.0; extra == "all"
Requires-Dist: openai>=1.0.0; extra == "all"
Requires-Dist: together>=1.0.0; extra == "all"
Requires-Dist: pyaudio>=0.2.13; extra == "all"
Dynamic: license-file

# Intellema VDK

Intellema VDK is a unified Voice Development Kit that simplifies integration with voice agent platforms like LiveKit and Retell AI. Build scalable voice applications with a consistent, provider-agnostic API.

## Features

- **Voice Providers**: LiveKit and Retell AI support with unified interface
- **Outbound Calling**: Initiate phone calls via SIP trunks
- **Speech-to-Text**: Transcribe audio with OpenAI Whisper
- **Text-to-Speech**: Low-latency streaming TTS via Together AI
- **Recording & Streaming**: Save to S3 or stream to RTMP
- **Participant Management**: Tokens, muting, kick controls
- **Real-time Messaging**: Send data packets during calls

## Quick Start

### Installation

```bash
# Minimal installation (core dependencies only)
pip install intellema-vdk

# Install with specific provider support
pip install intellema-vdk[livekit]    # LiveKit voice provider
pip install intellema-vdk[retell]     # Retell voice provider
pip install intellema-vdk[stt]        # Speech-to-Text features
pip install intellema-vdk[tts]        # Text-to-Speech features
pip install intellema-vdk[audio]      # Audio playback (PyAudio)

# Install all features
pip install intellema-vdk[all]
```

**Requirements:** Python 3.8+

**Note on PyAudio:** The `audio` extra requires PortAudio to be installed on your system:
- **Windows**: Usually works with `pip install pyaudio`, or use `pipwin install pyaudio`
- **macOS**: `brew install portaudio && pip install pyaudio`
- **Linux**: `sudo apt-get install portaudio19-dev && pip install pyaudio`

The package will automatically install required dependencies when you first use a feature.

### Minimal Example

```python
import asyncio
from intellema_vdk import VoiceClient

async def main() -> None:
    client = VoiceClient("livekit")  # or "retell"
    
    call_id: str = await client.start_outbound_call(
        phone_number="+15551234567",
        prompt_content="Hello from VoxChain!"
    )
    print(f"Call started: {call_id}")
    
    await client.close()

if __name__ == "__main__":
    asyncio.run(main())
```

### Configuration

Create a `.env` file with your credentials:

```bash
# LiveKit (if using)
LIVEKIT_URL=wss://your-livekit-server.com
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
SIP_OUTBOUND_TRUNK_ID=your_trunk_id

# Retell + Twilio (if using)
TWILIO_ACCOUNT_SID=your_sid
TWILIO_AUTH_TOKEN=your_token
TWILIO_PHONE_NUMBER=+15551234567
RETELL_API_KEY=your_retell_key
RETELL_AGENT_ID=your_agent_id

# STT
OPENAI_API_KEY=sk-your-key
AGENT_API_URL=https://your-agent-api.com/process  # Optional

# TTS (set appropriate API key according to provider)
TOGETHER_API_KEY=your_together_key
OPENAI_API_KEY=your_openai_key

# Optional: AWS for recordings
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
AWS_REGION=us-east-1
AWS_S3_BUCKET=your-bucket
```

See [docs/guides/configuration.md](docs/guides/configuration.md) for detailed setup.

## Core Modules

### Voice Providers

Choose between LiveKit or Retell for voice calls.

```python
from intellema_vdk import VoiceClient

# LiveKit for advanced features
livekit = VoiceClient("livekit")

# Retell for quick setup
retell = VoiceClient("retell")

# Common interface
call_id: str = await livekit.start_outbound_call("+15551234567", "Hello!")
await livekit.start_recording(call_id)
await livekit.delete_room(call_id)
```

**Detailed Documentation:**
- [docs/api/providers.md](docs/api/providers.md) - Full API reference with examples
- [docs/guides/examples.md](docs/guides/examples.md) - Complete usage patterns

**Important for Retell:**
Before making calls, register your Twilio number:
```bash
python import_phone_number.py
```

### Speech-to-Text (STT)

Transcribe audio files with OpenAI Whisper - supports single files and batch processing:

```python
from intellema_vdk import STTManager

async def transcribe() -> None:
    stt = STTManager()
    try:
        # Single file
        result = await stt.transcribe_audio("recording.wav")
        print(result["text"])
        
        # Batch process folder
        results = await stt.transcribe_audio(
            "recordings/",
            batch_process=True,
            output_file="transcripts.json"
        )
    finally:
        await stt.close()
```

**Detailed Documentation:** [docs/api/stt.md](docs/api/stt.md)

### Text-to-Speech (TTS)

Stream text to audio in real-time with support for multiple providers:

```python
from intellema_vdk import TTSStreamer

# Together AI (low latency)
tts = TTSStreamer(provider="together")

# OpenAI (high quality, 6 voices)
tts = TTSStreamer(
    provider="openai",
    voice="nova",      # alloy, echo, fable, onyx, nova, shimmer
    model="tts-1-hd"   # tts-1 or tts-1-hd
)

# Feed text as it's generated
for chunk in llm_stream:
    tts.feed(chunk)

tts.flush()  # Wait for completion
tts.close()
```

**Detailed Documentation:** [docs/api/tts.md](docs/api/tts.md)

**Sample Implementation:** Run the included chatbot demo:
```bash
python sample_implementation.py
```

## Advanced Usage

### Logging

Configure logging to see VDK internals:

```python
from intellema_vdk import setup_logging

setup_logging()  # INFO level by default
```

Custom configuration:

```python
import logging
setup_logging(
    log_level=logging.DEBUG,
    log_format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
```

### Recording Calls

```python
# LiveKit or Retell
recording_id: str = await client.start_recording(
    call_id=call_id,
    upload_to_s3=True,
    wait_for_completion=False
)
```

### Streaming to RTMP

```python
await client.start_stream(
    call_id=call_id,
    rtmp_urls=["rtmp://your-server.com/live/key"]
)
```

## Documentation

- **[Getting Started Guide](docs/guides/getting_started.md)** - Setup and first steps
- **[Configuration Guide](docs/guides/configuration.md)** - Environment variables
- **[Examples](docs/guides/examples.md)** - Common usage patterns
- **API Reference:**
  - [Voice Providers](docs/api/providers.md) - LiveKit & Retell
  - [STT](docs/api/stt.md) - Speech-to-Text
  - [TTS](docs/api/tts.md) - Text-to-Speech

## Important Notes

- **Retell `delete_room` Limitation**: Only works if the user speaks, triggering the agent to check the termination variable. For immediate hangup, use Twilio API directly.
- **Retell Recording**: Retell automatically records calls. The `start_recording` method retrieves the recording URL after the call ends (no need to explicitly start recording during the call). Ensure recording is enabled for your Retell agent in the dashboard.
- **Retell Audio Streaming**: Real-time audio streaming (`start_stream`) is **not supported** for Retell phone calls. Retell deprecated their Audio WebSocket API at the end of 2024. Use `start_recording()` to retrieve recordings after the call ends.
- **Type Safety**: All examples include type annotations for better IDE support.
- **Async Required**: All voice and STT operations are async; use `asyncio.run()`.

## License

See [LICENSE](LICENSE) file for details.

