Metadata-Version: 2.4
Name: synkro
Version: 0.1.1
Summary: Generate training datasets from any document
Author: Murtaza Meerza
License-Expression: MIT
License-File: LICENSE
Keywords: dataset-generation,fine-tuning,llm,synthetic-data,training-data
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: html2text>=2020.1
Requires-Dist: httpx>=0.25
Requires-Dist: instructor>=1.0
Requires-Dist: litellm>=1.40
Requires-Dist: mammoth>=1.6
Requires-Dist: marker-pdf>=0.2
Requires-Dist: pydantic>=2.0
Requires-Dist: rich>=13.0
Requires-Dist: typer>=0.9
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Description-Content-Type: text/markdown

# Synkro

**Generate training datasets from any document.**

Synkro converts documents into training data for fine-tuning LLMs. It handles the full pipeline: analyzing documents, generating scenarios, creating responses, evaluating quality, and refining failures.

```python
from synkro.pipelines import create_pipeline
from synkro.models.openai import OpenAI
from synkro.types import DatasetType

# Create a pipeline
pipeline = create_pipeline(
    model=OpenAI.GPT_4O_MINI,
    dataset_type=DatasetType.SFT,
)

# Generate 50 SFT traces (chat messages for fine-tuning)
dataset = pipeline.generate(
    "All expenses over $50 require manager approval.",
    traces=50,
)
dataset.save("training.jsonl")
```

## Features

- **Quality evaluation**: Each response is graded for accuracy and automatically refined if it fails
- **Multiple formats**: Generate SFT (chat messages), QA (question-answer pairs), or DPO (preference pairs)
- **Customizable**: Choose different models for generation and grading, build custom pipelines
- **Modular**: Use individual components (Planner, Generator, Grader, Refiner) separately
- **Quality filtering**: Filter datasets by pass/fail status, category, and other criteria

## Table of Contents

- [Installation](#installation)
- [Quick Start](#quick-start)
- [Dataset Types](#dataset-types)
- [Evaluation & Grading](#evaluation--grading)
- [Core Concepts](#core-concepts)
- [Models & Providers](#models--providers)
- [Advanced Usage](#advanced-usage)
- [CLI Reference](#cli-reference)
- [Examples](#examples)
- [API Reference](#api-reference)
- [Troubleshooting](#troubleshooting)
- [Requirements](#requirements)
- [License](#license)

## Installation

```bash
pip install synkro
```

Or with `uv` (faster):

```bash
uv pip install synkro
```

That's it! Everything works out of the box:
- ✅ PDF parsing (`marker-pdf`)
- ✅ DOCX parsing (`mammoth`)
- ✅ URL fetching (`httpx`, `beautifulsoup4`, `html2text`)
- ✅ All LLM providers (via `litellm`)
- ✅ CLI (`synkro` command)

**Optional**: For HuggingFace integration:
```bash
pip install datasets
```

## Quick Start

### Copy-Paste This (It Works)

```python
from synkro.pipelines import create_pipeline
from synkro.models.openai import OpenAI
from synkro.types import DatasetType
from synkro.examples import EXPENSE_POLICY

# Create a pipeline
pipeline = create_pipeline(
    model=OpenAI.GPT_4O_MINI,
    grading_model=OpenAI.GPT_4O,
    dataset_type=DatasetType.SFT,
)

# Generate 20 SFT traces (takes ~2 min)
dataset = pipeline.generate(EXPENSE_POLICY, traces=20)

# Save to file (auto-names it)
dataset.save()

# View summary
print(dataset.summary())
```

### From Your Own Text

```python
from synkro.pipelines import create_pipeline
from synkro.models.openai import OpenAI
from synkro.types import DatasetType

pipeline = create_pipeline(
    model=OpenAI.GPT_4O_MINI,
    dataset_type=DatasetType.SFT,
)

dataset = pipeline.generate(
    """
    All expenses over $50 require manager approval.
    Expenses over $500 require VP approval.
    Receipts required for all purchases over $25.
    """,
    traces=50,
)
dataset.save("training.jsonl")
```

### From Files

Synkro supports multiple file formats:

```python
from synkro.pipelines import create_pipeline
from synkro.core.policy import Policy

# PDF, DOCX, TXT, MD all work
policy = Policy.from_file("handbook.pdf")

pipeline = create_pipeline()
dataset = pipeline.generate(policy)
dataset.save()
```

Supported formats:
- **PDF** (`.pdf`) - Requires `marker-pdf`
- **Word** (`.docx`) - Requires `mammoth` (included)
- **Text** (`.txt`, `.md`) - Native support

### From URLs

```python
from synkro.pipelines import create_pipeline
from synkro.core.policy import Policy

policy = Policy.from_url("https://example.com/terms")

pipeline = create_pipeline()
dataset = pipeline.generate(policy)
dataset.save()
```

### What Happens Under the Hood

When you call `pipeline.generate()`, Synkro runs a 5-step pipeline:

1. **Plan**: Analyzes your document and creates a category distribution plan
2. **Generate Scenarios**: Creates diverse test scenarios for each category
3. **Generate Responses**: Produces expert responses with reasoning
4. **Grade**: Evaluates each response for accuracy and completeness
5. **Refine**: Automatically fixes failed responses (up to 3 iterations)

The result is a dataset with a typical pass rate of 80-95%.

You can customize:
- Dataset types (SFT, QA, DPO)
- Grader models (GPT-4o, Claude, local models)
- Pipeline components (use Planner, Generator, Grader, Refiner separately)
- Quality filtering (filter by pass/fail, category, length)
- Prompts and formatters

## Dataset Types

Synkro supports three output formats. Choose the one that matches your training needs.

### Quick Comparison

| Format | Structure | Best For | When to Use |
|--------|-----------|----------|-------------|
| **SFT** | Chat messages | General fine-tuning | Training chat models, instruction-following |
| **QA** | Question-answer pairs | RAG systems | Knowledge bases, retrieval training |
| **DPO** | Preference pairs | RLHF/alignment | Learning preferences, response ranking |

### SFT (Supervised Fine-Tuning)

Default format. Outputs chat messages for fine-tuning chat models.

```python
from synkro.pipelines import create_pipeline
from synkro.types import DatasetType

pipeline = create_pipeline(dataset_type=DatasetType.SFT)
dataset = pipeline.generate(policy)
dataset.save("sft.jsonl", format="sft")
```

**Format Structure:**
```json
{"messages": [
  {"role": "system", "content": "You are a domain expert..."},
  {"role": "user", "content": "What's the approval process for a $350 expense?"},
  {"role": "assistant", "content": "<reasoning>...</reasoning>\n\nFor a $350 expense..."}
]}
```

**Use when:**
- Fine-tuning chat models (Llama, Mistral, GPT)
- Training instruction-following models
- General fine-tuning
- Building domain-specific assistants

**Works with:**
- OpenAI fine-tuning API
- HuggingFace Transformers
- Unsloth
- Axolotl
- Most fine-tuning frameworks

### QA (Question-Answer)

Question-answer pairs for RAG systems and knowledge bases.

```python
from synkro.pipelines import create_pipeline
from synkro.types import DatasetType

pipeline = create_pipeline(dataset_type=DatasetType.QA)
dataset = pipeline.generate(policy)
dataset.save("qa.jsonl", format="qa")
```

**Format Structure:**
```json
{
  "question": "What's the approval process for a $350 expense?",
  "answer": "For a $350 expense, you need manager approval...",
  "context": "Expenses $50-$500 require manager approval..."
}
```

**Use when:**
- Training RAG systems
- Building knowledge bases
- Creating FAQs
- Retrieval-augmented generation
- Training embedding models

### DPO (Direct Preference Optimization)

Preference pairs (chosen vs rejected responses) for training models to prefer better responses.

```python
from synkro.pipelines import create_pipeline
from synkro.types import DatasetType

pipeline = create_pipeline(dataset_type=DatasetType.DPO)
dataset = pipeline.generate(policy)
dataset.save("dpo.jsonl", format="dpo")
```

**Format Structure:**
```json
{
  "prompt": "What's the approval process for a $350 expense?",
  "chosen": "For a $350 expense, you need manager approval...",
  "rejected": "Just submit it directly, no approval needed..."
}
```

**Use when:**
- Training with RLHF (Reinforcement Learning from Human Feedback)
- Preference learning
- Alignment training
- Improving response quality
- Teaching models to prefer better responses

### Converting Between Formats

You can generate in one format and convert to another:

```python
from synkro.pipelines import create_pipeline
from synkro.types import DatasetType

# Generate in SFT format
pipeline = create_pipeline(dataset_type=DatasetType.SFT)
dataset = pipeline.generate(policy)

# Save in different formats
dataset.save("sft.jsonl", format="sft")
dataset.save("qa.jsonl", format="qa")  # Converts to QA
dataset.save("dpo.jsonl", format="dpo")  # Converts to DPO
```

See `examples/advanced_usage.py` for examples of all dataset types.

### HuggingFace Integration

Push datasets directly to HuggingFace Hub:

```python
# Convert to HuggingFace Dataset
hf_dataset = dataset.to_huggingface()

# Push to hub
hf_dataset.push_to_hub("my-org/policy-dataset")

# Or use locally
from datasets import load_dataset
local_dataset = load_dataset("my-org/policy-dataset")
```

**Note:** Requires `pip install datasets` (optional dependency)

## Evaluation & Grading

Synkro evaluates each response for quality and refines failures. Every response is checked against the policy, and failed responses are improved automatically.

### How Grading Works

Each generated response is graded on:

1. **Policy Compliance** - Does it follow the policy exactly?
2. **Proper Citations** - Are policy sections cited correctly?
3. **Complete Reasoning** - Is the logic chain complete?
4. **Actionable Recommendations** - Are suggestions specific and implementable?

```python
from synkro.pipelines import create_pipeline
from synkro.models.openai import OpenAI

# Use different models for generation vs grading
pipeline = create_pipeline(
    model=OpenAI.GPT_4O_MINI,       # Fast, cheap generation
    grading_model=OpenAI.GPT_4O,    # High-quality evaluation
)

dataset = pipeline.generate(policy, traces=50)

# Access grade results
for trace in dataset:
    if trace.grade:
        print(f"Passed: {trace.grade.passed}")
        print(f"Issues: {trace.grade.issues}")
        print(f"Feedback: {trace.grade.feedback}")
```

### Custom Grader Selection

Choose a model for evaluation:

```python
from synkro.pipelines import create_pipeline
from synkro.models.openai import OpenAI
from synkro.models.anthropic import Anthropic
from synkro.models.ollama import Ollama

# High-quality grading with GPT-4o (default)
pipeline = create_pipeline(grading_model=OpenAI.GPT_4O)

# Alternative: Use Claude for nuanced evaluation
pipeline = create_pipeline(grading_model=Anthropic.CLAUDE_35_SONNET)

# Or use local models for free evaluation
pipeline = create_pipeline(grading_model=Ollama.QWEN_25_32B)
```

**Grader options:**
- **GPT-4o** (default) - Good balance of accuracy and speed
- **Claude 3.5 Sonnet** - Good for nuanced policy evaluation
- **Local models** - Free but slower, good for testing
- **Faster models** - Use for bulk evaluation with lower quality requirements

### Evaluation Workflow

The evaluation process:

1. Grades each response against the policy
2. Identifies issues (violations, missing citations, etc.)
3. Refines failed responses (up to 3 iterations)
4. Re-grades refined responses
5. Returns all traces with grades attached

```python
from synkro.pipelines import create_pipeline

pipeline = create_pipeline(max_iterations=3)  # Try up to 3 times to fix failures

dataset = pipeline.generate(policy, traces=100)

# Check quality metrics
print(f"Pass rate: {dataset.passing_rate:.1%}")
print(f"Total traces: {len(dataset)}")

# Filter to only passing traces
high_quality = dataset.filter(passed=True)
print(f"High quality traces: {len(high_quality)}")
```

### Accessing Grade Results

Every trace includes detailed grade information:

```python
from synkro.pipelines import create_pipeline

pipeline = create_pipeline()
dataset = pipeline.generate(policy)

for trace in dataset:
    if trace.grade:
        print(f"Scenario: {trace.scenario.description}")
        print(f"Passed: {trace.grade.passed}")
        
        if trace.grade.issues:
            print("Issues found:")
            for issue in trace.grade.issues:
                print(f"  - {issue}")
        
        print(f"Feedback: {trace.grade.feedback}")
        print("---")
```

### Filtering by Quality

Filter datasets based on grade results:

```python
from synkro.pipelines import create_pipeline

pipeline = create_pipeline()
dataset = pipeline.generate(policy, traces=100)

# Only passing traces
passing = dataset.filter(passed=True)

# Only failing traces (for analysis)
failing = dataset.filter(passed=False)

# Combined with other filters
high_quality_long = dataset.filter(
    passed=True,
    min_length=500,  # Minimum response length
    category="expenses"  # Specific category
)
```

### Improving Pass Rates

If your pass rate is low, try:

1. **Stronger grading model**:
   ```python
   pipeline = create_pipeline(grading_model=OpenAI.GPT_4O)
   ```

2. **More refinement iterations**:
   ```python
   pipeline = create_pipeline(max_iterations=5)
   ```

3. **Better policy clarity**: Ensure your policy has clear, specific rules

4. **Filter to passing only**:
   ```python
   passing = dataset.filter(passed=True)
   ```

See `examples/advanced_usage.py` for examples of custom grading and evaluation workflows.

## Core Concepts

### Policy

A `Policy` represents the source document you want to generate training data from.

```python
from synkro.core.policy import Policy

# Create from text
policy = Policy(text="Your policy content here...")

# Load from file
policy = Policy.from_file("handbook.pdf")

# Load from URL
policy = Policy.from_url("https://example.com/policy")

# Access properties
print(policy.word_count)  # Word count
print(policy.char_count)  # Character count
print(policy.source)      # Source file/URL if loaded
```

**Methods:**
- `from_file(path)` - Load from PDF, DOCX, TXT, or MD file
- `from_url(url)` - Fetch and parse from URL
- `word_count` - Property: number of words
- `char_count` - Property: number of characters

### Dataset

A `Dataset` contains the generated training traces.

```python
from synkro.pipelines import create_pipeline

pipeline = create_pipeline()
dataset = pipeline.generate(policy)

# Access traces
print(len(dataset))           # Number of traces
print(dataset[0])             # First trace
for trace in dataset:         # Iterate
    print(trace)

# Filter traces
passing = dataset.filter(passed=True)
by_category = dataset.filter(category="expenses")
long_responses = dataset.filter(min_length=500)

# Analyze
print(dataset.passing_rate)   # Percentage that passed (0.0-1.0)
print(dataset.categories)     # List of unique categories
print(dataset.summary())      # Human-readable summary

# Export
dataset.save("training.jsonl")
dataset.to_huggingface()      # Convert to HuggingFace Dataset
dataset.to_dict()             # Convert to Python dict
```

**Methods:**
- `filter(passed=None, category=None, min_length=None)` - Filter traces by criteria
- `save(path=None, format="sft")` - Save to JSONL file
- `to_jsonl(format="sft")` - Convert to JSONL string
- `to_huggingface()` - Convert to HuggingFace Dataset
- `to_dict()` - Convert to dictionary
- `summary()` - Get human-readable summary

**Properties:**
- `passing_rate` - Float: percentage of traces that passed (0.0-1.0)
- `categories` - List[str]: unique category names in dataset

### Trace

A `Trace` represents a single training example with a conversation.

```python
trace = dataset[0]

# Access messages
print(trace.messages)              # List of Message objects
print(trace.system_message)       # System message content
print(trace.user_message)          # User message content
print(trace.assistant_message)     # Assistant response

# Access scenario
print(trace.scenario.description)  # Scenario description
print(trace.scenario.category)     # Category name
print(trace.scenario.context)      # Additional context

# Access grade
if trace.grade:
    print(trace.grade.passed)      # True/False
    print(trace.grade.issues)      # List of issues found
    print(trace.grade.feedback)     # Feedback text
```

**Structure:**
- `messages: list[Message]` - Conversation messages
- `scenario: Scenario` - The test scenario
- `grade: GradeResult | None` - Grading result if graded

### Message

A `Message` is a single turn in a conversation.

```python
message = trace.messages[0]
print(message.role)      # "system", "user", or "assistant"
print(message.content)  # Message text
```

### Scenario

A `Scenario` describes the test case for a trace.

```python
scenario = trace.scenario
print(scenario.description)  # What the scenario tests
print(scenario.category)     # Category name (e.g., "expenses")
print(scenario.context)      # Additional context
```

### The Generation Pipeline

```mermaid
flowchart TD
    A[Policy Document] --> B[Planner]
    B --> C[Category Plan]
    C --> D[Scenario Generator]
    D --> E[Scenarios]
    E --> F[Response Generator]
    F --> G[Initial Traces]
    G --> H[Grader]
    H --> I{Pass?}
    I -->|Yes| J[Final Dataset]
    I -->|No| K[Refiner]
    K --> H
    K --> L{Max Iterations?}
    L -->|Yes| J
    L -->|No| H
```

The pipeline ensures quality through:
- **Category-based generation**: Scenarios are organized into categories for comprehensive coverage
- **Grading**: Each response is evaluated for accuracy
- **Refinement**: Failed responses are improved up to 3 times
- **Progress tracking**: Real-time progress bars show generation status

## Models & Providers

Synkro works with any LLM provider via LiteLLM. You can use different models for generation (fast, cheap) and grading (high quality).

### OpenAI

```python
from synkro.pipelines import create_pipeline
from synkro.models.openai import OpenAI

pipeline = create_pipeline(
    model=OpenAI.GPT_4O_MINI,       # Fast, cheap generation
    grading_model=OpenAI.GPT_4O,    # High-quality grading
)
```

**Available Models:**
- `OpenAI.GPT_4O` - High quality, good for grading
- `OpenAI.GPT_4O_MINI` - Fast and cheap, good for generation
- `OpenAI.GPT_4_TURBO` - Balanced performance
- `OpenAI.GPT_4` - Legacy GPT-4
- `OpenAI.GPT_35_TURBO` - Cheapest option
- `OpenAI.O1` - Reasoning model
- `OpenAI.O1_MINI` - Smaller reasoning model
- `OpenAI.O1_PREVIEW` - Preview reasoning model

**Environment Variable:** `OPENAI_API_KEY`

### Anthropic

```python
from synkro.pipelines import create_pipeline
from synkro.models.anthropic import Anthropic

pipeline = create_pipeline(
    model=Anthropic.CLAUDE_35_HAIKU,      # Fast generation
    grading_model=Anthropic.CLAUDE_35_SONNET,  # High-quality grading
)
```

**Available Models:**
- `Anthropic.CLAUDE_35_SONNET` - Good balance of speed and quality
- `Anthropic.CLAUDE_35_HAIKU` - Fastest, cheapest
- `Anthropic.CLAUDE_3_OPUS` - Most capable (slower)
- `Anthropic.CLAUDE_3_SONNET` - Previous generation
- `Anthropic.CLAUDE_3_HAIKU` - Previous generation (fast)

**Environment Variable:** `ANTHROPIC_API_KEY`

### Google

```python
from synkro.pipelines import create_pipeline
from synkro.models.google import Google

pipeline = create_pipeline(
    model=Google.GEMINI_25_FLASH,
    grading_model=Google.GEMINI_3_PRO,
)
```

**Available Models:**
- `Google.GEMINI_3_PRO` - Most intelligent, best for grading (preview)
- `Google.GEMINI_3_FLASH` - Balanced speed and intelligence (preview)
- `Google.GEMINI_25_FLASH` - Best price-performance (stable, recommended)
- `Google.GEMINI_25_PRO` - High quality with thinking (preview)
- `Google.GEMINI_2_FLASH` - Fast workhorse model (stable)
- `Google.GEMINI_2_FLASH_LITE` - Cheapest, fastest (stable)
- `Google.GEMINI_15_PRO` - Legacy high quality
- `Google.GEMINI_15_FLASH` - Legacy fast model

**Environment Variable:** `GEMINI_API_KEY`

### Ollama (Local, Free)

Run models locally with no API costs:

```python
from synkro.pipelines import create_pipeline
from synkro.models.ollama import Ollama

# First, install and pull the model locally:
# ollama pull llama3.1:8b

pipeline = create_pipeline(
    model=Ollama.LLAMA_31_8B,
    grading_model=Ollama.QWEN_25_32B,
)
```

**Available Models:**
- `Ollama.LLAMA_31_8B` - Fast, good quality
- `Ollama.LLAMA_31_70B` - Higher quality (slower)
- `Ollama.LLAMA_32_3B` - Smallest, fastest
- `Ollama.MISTRAL_7B` - Alternative 7B model
- `Ollama.MIXTRAL_8X7B` - Mixture of experts
- `Ollama.QWEN_25_7B` - Qwen 2.5 7B
- `Ollama.QWEN_25_32B` - Qwen 2.5 32B (high quality)
- `Ollama.PHI_3` - Microsoft Phi-3
- `Ollama.GEMMA_2` - Google Gemma 2

**Setup:**
1. Install Ollama: https://ollama.ai
2. Pull a model: `ollama pull llama3.1:8b`
3. Use in Synkro: `create_pipeline(model=Ollama.LLAMA_31_8B)`

**No API key required**

### Groq (Very Fast)

Ultra-fast inference for bulk generation:

```python
from synkro.pipelines import create_pipeline
from synkro.models.groq import Groq

pipeline = create_pipeline(model=Groq.LLAMA_33_70B)  # Very fast
```

**Available Models:**
- `Groq.LLAMA_33_70B` - Latest Llama 3.3, very fast
- `Groq.LLAMA_31_70B` - Llama 3.1 70B
- `Groq.LLAMA_31_8B` - Llama 3.1 8B
- `Groq.MIXTRAL_8X7B` - Mixtral 8x7B
- `Groq.GEMMA_2_9B` - Gemma 2 9B

**Environment Variable:** `GROQ_API_KEY`

### Together AI

```python
from synkro.pipelines import create_pipeline
from synkro.models.together import Together

pipeline = create_pipeline(model=Together.LLAMA_31_70B)
```

**Available Models:**
- `Together.LLAMA_31_405B` - Largest Llama model
- `Together.LLAMA_31_70B` - High quality
- `Together.QWEN_25_72B` - Qwen 2.5 72B
- `Together.MIXTRAL_8X22B` - Mixtral 8x22B

**Environment Variable:** `TOGETHER_API_KEY`

### DeepSeek

```python
from synkro.pipelines import create_pipeline
from synkro.models.deepseek import DeepSeek

pipeline = create_pipeline(model=DeepSeek.DEEPSEEK_CHAT)
```

**Available Models:**
- `DeepSeek.DEEPSEEK_CHAT` - General purpose
- `DeepSeek.DEEPSEEK_CODER` - Code-focused

**Environment Variable:** `DEEPSEEK_API_KEY`

### Custom Model Strings

You can also use any model string directly:

```python
from synkro.pipelines import create_pipeline

pipeline = create_pipeline(
    model="groq/llama-3.3-70b-versatile",
    grading_model="gpt-4o",
)
```

### Model Selection Strategy

**For Generation (Fast, Cheap):**
- `OpenAI.GPT_4O_MINI` - Best balance (default)
- `Anthropic.CLAUDE_35_HAIKU` - Very fast
- `Groq.LLAMA_33_70B` - Ultra-fast for bulk
- `Ollama.LLAMA_31_8B` - Free, local

**For Grading (High Quality):**
- `OpenAI.GPT_4O` - Best quality (default)
- `Anthropic.CLAUDE_35_SONNET` - Excellent quality
- `Google.GEMINI_3_PRO` - Most intelligent Gemini model
- `Google.GEMINI_25_FLASH` - Best Gemini price-performance

**Cost Optimization:**
- Use GPT-4o-mini for generation (10x cheaper than GPT-4o)
- Use GPT-4o only for grading (smaller volume)
- Or use Ollama for free local generation

**Performance Tips:**
- Groq is fastest for bulk generation
- Ollama is free but slower (good for testing)
- GPT-4o-mini offers good price/performance

## Advanced Usage

### Custom Pipeline Configuration

```python
from synkro.pipelines import create_pipeline
from synkro.models.openai import OpenAI
from synkro.types import DatasetType

pipeline = create_pipeline(
    model=OpenAI.GPT_4O_MINI,
    grading_model=OpenAI.GPT_4O,
    dataset_type=DatasetType.QA,
    max_iterations=5,  # More refinement attempts
)

dataset = pipeline.generate(policy, traces=100)
```

**Parameters:**
- `model` - Model for scenarios/responses
- `grading_model` - Model for quality grading
- `dataset_type` - QA, SFT, or DPO
- `max_iterations` - Max refinement attempts (default: 3)

### Custom Graders

Create custom graders with different models or configurations:

```python
from synkro.quality.grader import Grader
from synkro.llm.client import LLM
from synkro.models.openai import OpenAI
from synkro.models.anthropic import Anthropic

# High-quality grader with GPT-4o
grader = Grader(model=OpenAI.GPT_4O)

# Alternative: Use Claude for nuanced evaluation
claude_grader = Grader(model=Anthropic.CLAUDE_35_SONNET)

# Custom LLM configuration
custom_llm = LLM(
    model=OpenAI.GPT_4O,
    temperature=0.3,  # Lower temperature for consistent grading
    max_tokens=2048,
)
custom_grader = Grader(llm=custom_llm)

# Use in custom pipeline
pipeline = create_pipeline(grading_model=OpenAI.GPT_4O)
```

### Evaluation Workflows

Build custom evaluation workflows:

```python
from synkro.pipelines import create_pipeline
from synkro.core.dataset import Dataset
from synkro.quality.grader import Grader
from synkro.models.anthropic import Anthropic

# Generate dataset
pipeline = create_pipeline()
dataset = pipeline.generate(policy, traces=50)

# Re-evaluate with different grader
new_grader = Grader(model=Anthropic.CLAUDE_35_SONNET)

for trace in dataset:
    # Re-grade with new grader
    new_grade = await new_grader.grade(trace, policy.text)
    trace.grade = new_grade

# Compare pass rates
original_pass_rate = sum(1 for t in dataset if t.grade and t.grade.passed) / len(dataset)
print(f"Original pass rate: {original_pass_rate:.1%}")

# Filter by quality
high_quality = dataset.filter(passed=True)
print(f"High quality traces: {len(high_quality)}")
```

### Custom Prompts

Customize the prompts used for generation:

```python
from synkro.prompts import SystemPrompt, ScenarioPrompt

# Custom system prompt
custom_system = SystemPrompt(
    template="You are a compliance expert specializing in {domain}..."
)

# Custom scenario prompt
custom_scenario = ScenarioPrompt(
    template="Generate scenarios that test {category}..."
)

# Use in custom pipeline
# (Prompts are set on individual components)
```

### Custom Formatters

Create custom output formats:

```python
from synkro.formatters.sft import SFTFormatter

class CustomFormatter(SFTFormatter):
    def format(self, traces):
        examples = []
        for trace in traces:
            # Custom formatting logic
            example = {
                "input": trace.user_message,
                "output": trace.assistant_message,
                "metadata": {
                    "category": trace.scenario.category,
                }
            }
            examples.append(example)
        return examples

# Use custom formatter
formatter = CustomFormatter()
formatter.save(dataset.traces, "custom.jsonl")
```

### Async Usage

For async contexts:

```python
import asyncio
from synkro.pipelines import create_pipeline

async def generate_async():
    pipeline = create_pipeline()
    dataset = await pipeline.generate_async(policy, traces=50)
    return dataset

dataset = asyncio.run(generate_async())
```

### Filtering and Analysis

Analyze and filter your dataset:

```python
from synkro.pipelines import create_pipeline

pipeline = create_pipeline()
dataset = pipeline.generate(policy, traces=100)

# Filter by grade
passing = dataset.filter(passed=True)
failing = dataset.filter(passed=False)

# Filter by category
expense_traces = dataset.filter(category="expenses")

# Filter by length
long_responses = dataset.filter(min_length=500)

# Combine filters
high_quality = dataset.filter(passed=True, min_length=300)

# Analyze
print(f"Pass rate: {dataset.passing_rate:.1%}")
print(f"Categories: {dataset.categories}")
print(dataset.summary())

# Export filtered dataset
high_quality.save("high_quality.jsonl")
```

### LLM Configuration

Fine-tune LLM behavior:

```python
from synkro.llm.client import LLM
from synkro.generation.scenarios import ScenarioGenerator
from synkro.models.openai import OpenAI

# Custom temperature and tokens
llm = LLM(
    model=OpenAI.GPT_4O_MINI,
    temperature=0.8,      # More creative (0.0-2.0)
    max_tokens=4096,      # Longer responses
    api_key="sk-...",     # Override API key
)

# Use in custom pipeline
generator = ScenarioGenerator(llm=llm)
```

## CLI Reference

### Generate Command

Generate training data from various sources:

```bash
# From file
synkro generate policy.pdf

# From text
synkro generate "All expenses over $50 need approval"

# From URL
synkro generate https://example.com/policy

# With options
synkro generate policy.pdf \
  --traces 50 \
  --output training.jsonl \
  --format sft \
  --model gpt-4o-mini
```

**Options:**
- `--output, -o` - Output file path (auto-generated if not specified)
- `--traces, -n` - Number of traces to generate (default: 20)
- `--format, -f` - Output format: `sft`, `qa`, or `dpo` (default: `sft`)
- `--model, -m` - Model for generation (default: `gpt-4o-mini`)

**Examples:**
```bash
# Generate 100 traces in QA format
synkro generate handbook.pdf --traces 100 --format qa

# Use Claude for generation
synkro generate policy.txt --model claude-3-5-sonnet-20241022

# Custom output path
synkro generate "Your policy text" -o my_dataset.jsonl
```

### Demo Command

Run a quick demo with an example:

```bash
synkro demo
```

This generates 5 traces from the example expense policy and saves to `demo_output.jsonl`.

### Version Command

Check your Synkro version:

```bash
synkro version
```

### Help

Get help for any command:

```bash
synkro --help
synkro generate --help
```

## Examples

### Quick Start

Simple example to get started:

```python
from synkro.pipelines import create_pipeline
from synkro.examples import EXPENSE_POLICY

pipeline = create_pipeline()
dataset = pipeline.generate(EXPENSE_POLICY, traces=20)
dataset.save()
print(dataset.summary())
```

See `examples/quickstart.py` for the example.

### Advanced Usage

Comprehensive example demonstrating all features working together:

```python
import asyncio
from synkro.pipelines import create_pipeline
from synkro.core.policy import Policy
from synkro.core.dataset import Dataset
from synkro.types import DatasetType
from synkro.models.openai import OpenAI
from synkro.models.anthropic import Anthropic
from synkro.llm.client import LLM
from synkro.generation.planner import Planner
from synkro.generation.scenarios import ScenarioGenerator
from synkro.generation.responses import ResponseGenerator
from synkro.quality.grader import Grader
from synkro.quality.refiner import Refiner

async def main():
    # Load policy
    policy = Policy.from_file("handbook.pdf")
    
    # Part 1: Generate multiple dataset types
    sft_pipeline = create_pipeline(
        model=OpenAI.GPT_4O_MINI,
        grading_model=OpenAI.GPT_4O,
        dataset_type=DatasetType.SFT,
    )
    qa_pipeline = create_pipeline(dataset_type=DatasetType.QA)
    dpo_pipeline = create_pipeline(dataset_type=DatasetType.DPO)
    
    dataset_sft = sft_pipeline.generate(policy, traces=10)
    dataset_qa = qa_pipeline.generate(policy, traces=10)
    dataset_dpo = dpo_pipeline.generate(policy, traces=10)
    
    # Part 2: Custom grader selection
    claude_grader = Grader(model=Anthropic.CLAUDE_35_SONNET)
    for trace in dataset_sft:
        new_grade = await claude_grader.grade(trace, policy.text)
        trace.grade = new_grade
    
    # Part 3: Custom pipeline with individual components
    generation_llm = LLM(model=OpenAI.GPT_4O_MINI)
    grading_llm = LLM(model=OpenAI.GPT_4O)
    
    planner = Planner(llm=grading_llm)
    plan = await planner.plan(policy.text, target_traces=20)
    
    scenario_gen = ScenarioGenerator(llm=generation_llm)
    scenarios = await scenario_gen.generate(policy.text, count=20)
    
    response_gen = ResponseGenerator(llm=generation_llm)
    traces = await response_gen.generate(policy.text, scenarios)
    
    grader = Grader(llm=grading_llm)
    grades = await grader.grade_batch(traces, policy.text)
    
    refiner = Refiner(llm=generation_llm)
    for trace, grade in zip(traces, grades):
        if not grade.passed:
            refined = await refiner.refine(trace, grade, policy.text)
            new_grade = await grader.grade(refined, policy.text)
            refined.grade = new_grade
    
    # Part 4: Quality filtering
    dataset = Dataset(traces=traces)
    high_quality = dataset.filter(passed=True, min_length=200)
    
    # Export
    high_quality.save("training.jsonl", format="sft")

asyncio.run(main())
```

See `examples/advanced_usage.py` for a complete example that demonstrates:
- Generating multiple dataset types (SFT, QA, DPO) from the same policy
- Custom grader selection with different models
- Custom pipeline using individual components (Planner, Generator, Grader, Refiner)
- Evaluation workflow with grading and refinement
- Quality analysis and filtering by pass/fail, category, and length
- Exporting datasets in different formats

### Fine-Tuning Workflow

Workflow from generation to fine-tuning:

```python
from synkro.pipelines import create_pipeline
from synkro.core.policy import Policy
from synkro.models.openai import OpenAI

# Step 1: Generate training data
policy = Policy.from_file("company_handbook.pdf")
pipeline = create_pipeline(
    model=OpenAI.GPT_4O_MINI,
    grading_model=OpenAI.GPT_4O,
    max_iterations=3,
)
dataset = pipeline.generate(policy, traces=1000)

# Step 2: Save in SFT format
dataset.save("train_sft.jsonl", format="sft")

# Step 3: Filter to only passing traces
passing = dataset.filter(passed=True)
passing.save("train_high_quality.jsonl", format="sft")

# Step 4: Fine-tune with Unsloth (see examples/finetune_llama.py)
```

See `examples/finetune_llama.py` for a fine-tuning example with Unsloth.

### Domain-Specific Examples

Use example policies for different domains:

```python
from synkro.pipelines import create_pipeline
from synkro.examples import (
    EXPENSE_POLICY,
    HR_HANDBOOK,
    REFUND_POLICY,
    SUPPORT_GUIDELINES,
    SECURITY_POLICY,
)

pipeline = create_pipeline()

# Test with expense policy
dataset = pipeline.generate(EXPENSE_POLICY)

# Test with HR handbook
dataset = pipeline.generate(HR_HANDBOOK)

# Test with security policy
dataset = pipeline.generate(SECURITY_POLICY)
```

### Batch Processing

Process multiple policies:

```python
from synkro.pipelines import create_pipeline
from synkro.core.policy import Policy
from synkro.core.dataset import Dataset

policies = [
    Policy.from_file("policy1.pdf"),
    Policy.from_file("policy2.pdf"),
    Policy.from_file("policy3.pdf"),
]

pipeline = create_pipeline()
datasets = []
for policy in policies:
    dataset = pipeline.generate(policy, traces=50)
    datasets.append(dataset)

# Combine datasets
combined = Dataset(traces=[t for d in datasets for t in d.traces])
combined.save("combined.jsonl")
```

### HuggingFace Workflow

```python
from synkro.pipelines import create_pipeline

pipeline = create_pipeline()
dataset = pipeline.generate(policy, traces=100)

# Convert and push to HuggingFace
hf_dataset = dataset.to_huggingface()
hf_dataset.push_to_hub("my-org/policy-dataset")

# Use in training
from datasets import load_dataset
train_data = load_dataset("my-org/policy-dataset", split="train")
```

## API Reference

### Main Functions

#### `create_pipeline()`

Create a pipeline for generating training datasets.

```python
from synkro.pipelines import create_pipeline

def create_pipeline(
    model: Model = OpenAI.GPT_4O_MINI,
    dataset_type: DatasetType = DatasetType.SFT,
    grading_model: Model = OpenAI.GPT_4O,
    max_iterations: int = 3,
) -> Generator
```

**Parameters:**
- `model` - Model for generation (default: GPT-4o-mini)
- `dataset_type` - `DatasetType.SFT`, `DatasetType.QA`, or `DatasetType.DPO`
- `grading_model` - Model for grading (default: GPT-4o)
- `max_iterations` - Max refinement attempts (default: 3)

**Returns:** `Generator` instance

### Core Classes

#### `Policy`

Policy document container.

```python
from synkro.core.policy import Policy

class Policy:
    text: str
    source: str | None
    
    @classmethod
    def from_file(path: str | Path) -> Policy
    
    @classmethod
    def from_url(url: str) -> Policy
    
    @property
    def word_count() -> int
    
    @property
    def char_count() -> int
```

#### `Dataset`

Collection of training traces.

```python
from synkro.core.dataset import Dataset

class Dataset:
    traces: list[Trace]
    
    def filter(
        passed: bool | None = None,
        category: str | None = None,
        min_length: int | None = None,
    ) -> Dataset
    
    def save(path: str | Path | None = None, format: str = "sft") -> Dataset
    
    def to_jsonl(format: str = "sft") -> str
    
    def to_huggingface() -> "datasets.Dataset"
    
    def to_dict() -> dict
    
    def summary() -> str
    
    @property
    def passing_rate() -> float
    
    @property
    def categories() -> list[str]
```

#### `Generator`

Main generation orchestrator.

```python
from synkro.generation.generator import Generator

class Generator:
    def __init__(
        dataset_type: DatasetType = DatasetType.SFT,
        generation_model: Model = OpenAI.GPT_4O_MINI,
        grading_model: Model = OpenAI.GPT_4O,
        max_iterations: int = 3,
    )
    
    def generate(policy: Policy | str, traces: int = 20) -> Dataset
    
    async def generate_async(policy: Policy | str, traces: int = 20) -> Dataset
```

#### `Trace`

Single training example.

```python
from synkro.types import Trace

class Trace:
    messages: list[Message]
    scenario: Scenario
    grade: GradeResult | None
    
    @property
    def system_message() -> str | None
    
    @property
    def user_message() -> str
    
    @property
    def assistant_message() -> str
```

#### `Message`

Single conversation message.

```python
from synkro.types import Message

class Message:
    role: Literal["system", "user", "assistant"]
    content: str
```

#### `Scenario`

Test scenario description.

```python
from synkro.types import Scenario

class Scenario:
    description: str
    context: str
    category: str | None
```

#### `GradeResult`

Grading evaluation result.

```python
from synkro.types import GradeResult

class GradeResult:
    passed: bool
    issues: list[str]
    feedback: str
```

### LLM Classes

#### `LLM`

Type-safe LLM wrapper.

```python
from synkro.llm.client import LLM

class LLM:
    def __init__(
        model: Model = OpenAI.GPT_4O_MINI,
        temperature: float = 0.7,
        max_tokens: int = 2048,
        api_key: str | None = None,
    )
    
    async def generate(prompt: str, system: str | None = None) -> str
    
    async def generate_batch(prompts: list[str], system: str | None = None) -> list[str]
    
    async def generate_structured(
        prompt: str,
        response_model: Type[T],
        system: str | None = None,
    ) -> T
```

### Component Classes

#### `Planner`

Creates generation plans with categories.

```python
from synkro.generation.planner import Planner

class Planner:
    def __init__(llm: LLM)
    
    async def plan(policy_text: str, target_traces: int) -> Plan
```

#### `ScenarioGenerator`

Generates test scenarios.

```python
from synkro.generation.scenarios import ScenarioGenerator

class ScenarioGenerator:
    def __init__(llm: LLM)
    
    async def generate(
        policy_text: str,
        count: int,
        category: Category | None = None,
    ) -> list[Scenario]
```

#### `ResponseGenerator`

Generates expert responses.

```python
from synkro.generation.responses import ResponseGenerator

class ResponseGenerator:
    def __init__(llm: LLM)
    
    async def generate(
        policy_text: str,
        scenarios: list[Scenario],
    ) -> list[Trace]
```

#### `Grader`

Evaluates response quality.

```python
from synkro.quality.grader import Grader

class Grader:
    def __init__(llm: LLM)
    
    async def grade(trace: Trace, policy_text: str) -> GradeResult
    
    async def grade_batch(traces: list[Trace], policy_text: str) -> list[GradeResult]
```

#### `Refiner`

Improves failed responses.

```python
from synkro.quality.refiner import Refiner

class Refiner:
    def __init__(llm: LLM)
    
    async def refine(
        trace: Trace,
        grade: GradeResult,
        policy_text: str,
    ) -> Trace
```

### Formatter Classes

#### `SFTFormatter`

Formats traces for supervised fine-tuning.

```python
from synkro.formatters.sft import SFTFormatter

class SFTFormatter:
    def __init__(include_metadata: bool = False)
    
    def format(traces: list[Trace]) -> list[dict]
    
    def save(traces: list[Trace], path: str | Path) -> None
    
    def to_jsonl(traces: list[Trace]) -> str
```

#### `QAFormatter`

Formats traces as question-answer pairs.

```python
from synkro.formatters.qa import QAFormatter

class QAFormatter:
    def __init__(include_context: bool = True)
    
    def format(traces: list[Trace]) -> list[dict]
    
    def save(traces: list[Trace], path: str | Path) -> None
    
    def to_jsonl(traces: list[Trace]) -> str
```

#### `DPOFormatter`

Formats traces for direct preference optimization.

```python
from synkro.formatters.dpo import DPOFormatter

class DPOFormatter:
    def format(traces: list[Trace]) -> list[dict]
    
    def save(traces: list[Trace], path: str | Path) -> None
    
    def to_jsonl(traces: list[Trace]) -> str
```

## Troubleshooting

### API Key Issues

**Error:** `API key not found or invalid`

**Solutions:**
1. Set environment variable:
   ```bash
   export OPENAI_API_KEY="sk-..."
   ```

2. Use a different provider:
   ```python
   from synkro.pipelines import create_pipeline
   from synkro.models.anthropic import Anthropic
   
   pipeline = create_pipeline(model=Anthropic.CLAUDE_35_HAIKU)
   ```

3. Use free local models:
   ```python
   from synkro.pipelines import create_pipeline
   from synkro.models.ollama import Ollama
   
   pipeline = create_pipeline(model=Ollama.LLAMA_31_8B)
   ```

### Import Errors

**Error:** `No module named 'datasets'` (for HuggingFace integration)

**Solution:**
```bash
pip install datasets
```

**Note:** PDF parsing (`marker-pdf`) is already included - no installation needed!

### Rate Limiting

**Error:** `Rate limited by OpenAI`

**Solutions:**
1. Use a faster provider:
   ```python
   from synkro.pipelines import create_pipeline
   from synkro.models.groq import Groq
   
   pipeline = create_pipeline(model=Groq.LLAMA_33_70B)
   ```

2. Reduce trace count:
   ```python
   dataset = pipeline.generate(policy, traces=10)
   ```

3. Synkro automatically handles rate limits with retries and backoff.

### File Format Issues

**Error:** `Unsupported file type`

**Supported formats:**
- `.txt`, `.md` - Native support
- `.pdf` - Included (`marker-pdf`)
- `.docx` - Included (`mammoth`)
- URLs - Included (`httpx`, `html2text`)

All file formats work out of the box - no additional installation needed!

### Low Pass Rate

If your dataset has a low pass rate (< 70%):

1. **Check policy quality**: Ensure your policy has clear, specific rules
2. **Increase refinement iterations**:
   ```python
   pipeline = create_pipeline(max_iterations=5)
   ```
3. **Use stronger grading model**:
   ```python
   from synkro.models.openai import OpenAI
   pipeline = create_pipeline(grading_model=OpenAI.GPT_4O)
   ```
4. **Filter to passing traces**:
   ```python
   passing = dataset.filter(passed=True)
   passing.save("high_quality.jsonl")
   ```

### Slow Generation

**Solutions:**
1. Use faster models:
   ```python
   from synkro.models.groq import Groq
   pipeline = create_pipeline(model=Groq.LLAMA_33_70B)
   ```

2. Use local models (no API latency):
   ```python
   from synkro.models.ollama import Ollama
   pipeline = create_pipeline(model=Ollama.LLAMA_31_8B)
   ```

3. Reduce trace count for testing:
   ```python
   dataset = pipeline.generate(policy, traces=5)  # Quick test
   ```

### Model Not Found

**Error:** `Model not found: xyz`

**Solution:** Use a supported model enum or check the model string format:
```python
# Use enum (recommended)
from synkro.models.openai import OpenAI
from synkro.pipelines import create_pipeline

pipeline = create_pipeline(model=OpenAI.GPT_4O_MINI)

# Or use exact model string
pipeline = create_pipeline(model="gpt-4o-mini")
```

### Policy Too Short

**Error:** `Policy too short`

**Solution:** Provide more content (minimum 50+ words recommended):
```python
from synkro.core.policy import Policy

policy = Policy(text="""
All expenses over $50 require manager approval.
Expenses over $500 require VP approval.
Receipts are required for all purchases over $25.
Travel expenses must be pre-approved.
Software purchases need IT approval.
""")
```

## Requirements

### Python Version

- **Python 3.10+** required

### All Dependencies Included

Everything works out of the box - no extra installation needed:

- `litellm>=1.40` - Universal LLM interface
- `instructor>=1.0` - Structured outputs
- `pydantic>=2.0` - Data validation
- `rich>=13.0` - Terminal formatting
- `typer>=0.9` - CLI framework
- `marker-pdf>=0.2` - PDF parsing
- `mammoth>=1.6` - DOCX parsing
- `httpx>=0.25` - HTTP client (URL fetching)
- `beautifulsoup4>=4.12` - HTML parsing (URL fetching)
- `html2text>=2020.1` - HTML to text conversion

### Optional Dependencies

- `datasets` - HuggingFace integration (only needed for `dataset.to_huggingface()`)

### API Keys

You need an API key for at least one provider:

- **OpenAI**: `OPENAI_API_KEY` - https://platform.openai.com/api-keys
- **Anthropic**: `ANTHROPIC_API_KEY` - https://console.anthropic.com/settings/keys
- **Google**: `GEMINI_API_KEY` - https://aistudio.google.com/app/apikey
- **Groq**: `GROQ_API_KEY` - https://console.groq.com/keys
- **Together**: `TOGETHER_API_KEY` - https://api.together.xyz/settings/api-keys
- **DeepSeek**: `DEEPSEEK_API_KEY`

**Or use Ollama for free local models** (no API key needed).

## License

MIT
