Metadata-Version: 2.4
Name: rrag
Version: 0.1.0a0
Summary: rrag - retrieval-reasoning augmented generation engine
Project-URL: Repository, https://github.com/0x0064/rrag
Author-email: 0x0064 <user.frndvrgs@gmail.com>
License-Expression: MIT
License-File: LICENSE
Requires-Python: >=3.12
Requires-Dist: aiosqlite>=0.22.1
Requires-Dist: anthropic>=0.93.0
Requires-Dist: asyncpg>=0.31.0
Requires-Dist: baml-py==0.220.0
Requires-Dist: cohere>=6.0.0
Requires-Dist: fastembed>=0.8.0
Requires-Dist: lxml>=6.0.3
Requires-Dist: openai>=2.31.0
Requires-Dist: pydantic>=2.12.5
Requires-Dist: pymupdf>=1.27.2.2
Requires-Dist: qdrant-client>=1.17.1
Requires-Dist: rank-bm25>=0.2.2
Requires-Dist: scikit-learn>=1.8.0
Requires-Dist: sqlalchemy[asyncio]>=2.0.49
Requires-Dist: voyageai>=0.3.7
Provides-Extra: cli
Requires-Dist: click>=8.3.2; extra == 'cli'
Provides-Extra: dev
Requires-Dist: mypy>=1.20.0; extra == 'dev'
Requires-Dist: poethepoet>=0.44.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=1.3.0; extra == 'dev'
Requires-Dist: pytest-cov>=7.1.0; extra == 'dev'
Requires-Dist: pytest>=9.0.3; extra == 'dev'
Requires-Dist: ruff>=0.15.10; extra == 'dev'
Provides-Extra: graph
Requires-Dist: neo4j>=6.1.0; extra == 'graph'
Description-Content-Type: text/markdown

<img width="200" alt="rrag-ember(2)" src="https://github.com/user-attachments/assets/ce3262aa-7453-48d4-992d-94a6a5dedd2a" />

Retrieval / Reasoning Augmented Generation (RAG) Python SDK

## Retrieval-Augmented Generation

Modular retrieval-augmented generation with pluggable pipeline methods. No mandatory vector database — configure the retrieval paths you need: dense and sparse vector search, BM25 keyword matching, full-text document search, entity-relationship graph traversal, and LLM-powered tree navigation over hierarchical document structure. Methods run concurrently with per-method weights, results fuse via reciprocal rank fusion, and each method handles its own errors independently. Ingest through chunking, vision-analyzed drawings, or LLM entity extraction. Generate grounded answers with score gates, relevance judgment, and clarification flows.

[Documentation](src/rrag/retrieval/README.md) · [Examples](examples/retrieval)

```python
from rrag.retrieval import RagEngine, RagServerConfig, PersistenceConfig, IngestionConfig
from rrag.retrieval import QdrantVectorStore, Embeddings
from rrag import LanguageModelProvider

config = RagServerConfig(
    persistence=PersistenceConfig(
        vector_store=QdrantVectorStore(url="http://localhost:6333", collection="docs"),
    ),
    ingestion=IngestionConfig(
        embeddings=Embeddings(LanguageModelProvider(provider="openai", model="text-embedding-3-small", api_key="...")),
    ),
)

async with RagEngine(config) as rag:
    await rag.ingest("manual.pdf", knowledge_id="equipment")
    await rag.ingest("annual_report.pdf", knowledge_id="reports", tree_index=True)  # tree indexing for structured docs
    result = await rag.query("How do I replace the filter?", knowledge_id="equipment")
    print(result.answer)
```

```python
# Access individual retrieval methods for fine-grained control
async with RagEngine(config) as rag:
    await rag.ingest("manual.pdf", knowledge_id="equipment")

    # Use individual methods directly
    vector_chunks = await rag.retrieval.vector.search("pressure specs", top_k=20)
    doc_chunks = await rag.retrieval.document.search("pressure specs", top_k=10)

    # Or let the pipeline handle everything
    result = await rag.query("What are the pressure specifications?", knowledge_id="equipment")
```

```bash
rrag retrieval init
rrag retrieval ingest manual.pdf -k equipment
rrag retrieval query "how to replace the filter?" -k equipment
rrag retrieval retrieve "part number 8842-A" -k equipment
```

## Reasoning-Augmented Generation

Analysis, classification, compliance, evaluation, clustering, and pipeline composition. Each service is standalone — use one or compose them through pipelines.

[Documentation](src/rrag/reasoning/README.md) · [Get Started](examples/reasoning)

```python
from rrag.reasoning import AnalysisService, AnalysisConfig, DimensionDefinition
from rrag import LanguageModelClient, LanguageModelProvider

lm = LanguageModelClient(
    provider=LanguageModelProvider(provider="anthropic", model="claude-sonnet-4-20250514", api_key="..."),
)

analyzer = AnalysisService(lm_client=lm)
result = await analyzer.analyze(
    "My order FB-12345 hasn't arrived and I need it by Friday.",
    config=AnalysisConfig(
        dimensions=[DimensionDefinition("urgency", "How time-sensitive", "0.0-1.0")],
        summarize=True,
    ),
)
print(f"{result.primary_intent} — urgency: {result.dimensions['urgency'].value}")
```

```bash
rrag reasoning init
rrag reasoning analyze "My order FB-12345 hasn't arrived and I need it by Friday"
rrag reasoning classify "I want my money back" --categories categories.json
rrag reasoning compliance "We'll give you 150% refund" --references policy.md
```

## Installation

```bash
uv add rrag                           # add rrag to your project

uv add "rrag[graph]"                  # rrag + graph + Neo4j support
uv add "rrag[cli]"                    # rrag + CLI support

uv sync --extra dev                     # setup with dev optional
uv sync --all-extras                    # setup with all optional

uv run poe format                       # ruff format
uv run poe check                        # ruff lint
uv run poe check:fix                    # ruff lint + auto-fix
uv run poe typecheck                    # mypy type checking
uv run poe test                         # pytest
uv run poe baml:generate:retrieval      # regenerate retrieval BAML client
uv run poe baml:generate:reasoning      # regenerate reasoning BAML client
```

### vs. Long Context LLM

Long context windows and lexical search work for small document sets where you know what terms to look for. They break when knowledge bases grow beyond context limits, when users ask semantic questions ("how to change oil" vs. the manual's "lubricant replacement procedure"), and when you need entity relationships that no amount of text matching can surface. rrag makes vector search optional — you can run document-store-only or graph-only configs — but when you need semantic understanding across thousands of pages, concurrent multi-path retrieval with fusion outperforms any single method.

### vs. Vectorless Tree Search (PageIndex)

Tree-based retrieval proves that LLM reasoning over document structure beats vector similarity for navigating long structured documents. rrag's tree search follows the same principle — build a hierarchical index, let the LLM navigate it. The difference is that tree search is one method in the pipeline, not the only one. When the tree can't find the answer (wrong section, entity not in the TOC), vector search, BM25, document search, and graph traversal are all running in parallel. Tree search adds structural precision; the other methods add breadth and resilience.

### vs. LangChain / LlamaIndex

Framework orchestrators provide abstractions over LLM calls, prompt chains, and retrieval steps. rrag is not an orchestrator — it's a retrieval engine. It owns the full pipeline from document parsing through chunk embedding, multi-path search, reranking, and grounded generation. No chain composition, no prompt templates, no agent loops. One `async with RagEngine(config) as rag:` gives you ingestion, retrieval, and generation with quality gates. Use it inside LangChain if you want, or use it standalone — the SDK handles the retrieval problem end-to-end without external framework dependencies.

## Observability

All LLM calls go through [BAML](https://docs.boundaryml.com/) for structured output parsing, retry/fallback policies, and observability.

**Boundary Studio** — Set `boundary_api_key` on any `LanguageModelClient` to enable automatic cloud tracing with token counts, latency, and function-level tracking.

**Programmatic** — Use `baml_py.Collector` for in-process token usage tracking.

## SDK Env Variables

Read by the SDK when used as a library.

```bash
RRAG_LOG_ENABLED=false    # true / false
RRAG_LOG_LEVEL=INFO       # DEBUG, INFO, WARNING, ERROR
RRAG_BAML_LOG=warn        # info, warn, debug — BAML runtime log level
```

## CLI Env Variables

Used only by the `rrag retrieval` / `rrag reasoning` command-line tools. The SDK library never reads these directly — pass API keys explicitly via `LanguageModelProvider(api_key=...)` when using the SDK programmatically.

```bash
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
COHERE_API_KEY=
VOYAGE_API_KEY=
```
