Metadata-Version: 2.4
Name: raghilda
Version: 0.1.0
Summary: RAG made simple
Author: Daniel Falbel, Tomasz Kalinowski
Author-email: Daniel Falbel <daniel@posit.co>, Tomasz Kalinowski <tomasz@posit.co>
License-Expression: MIT
Requires-Dist: duckdb>=1.3.2
Requires-Dist: openai>=1.104.2
Requires-Dist: requests>=2.32.5
Requires-Dist: commonmark>=0.9.1
Requires-Dist: markitdown>=0.1.3
Requires-Dist: tqdm>=4.67.1
Requires-Dist: chromadb>=1.0.0 ; extra == 'chromadb'
Requires-Dist: chatlas>=0.2.0 ; extra == 'examples'
Requires-Dist: python-dotenv>=1.0.0 ; extra == 'examples'
Requires-Dist: sentence-transformers>=3.0.0 ; extra == 'sentence-transformers'
Requires-Dist: pyright>=1.1.405 ; extra == 'test'
Requires-Dist: pytest>=8.4.1 ; extra == 'test'
Requires-Dist: ruff>=0.12.11 ; extra == 'test'
Requires-Dist: chonkie>=1.0.0 ; extra == 'test'
Requires-Dist: cohere>=5.0.0 ; extra == 'test'
Requires-Dist: chromadb>=1.0.0 ; extra == 'test'
Requires-Dist: sentence-transformers>=3.0.0 ; extra == 'test'
Requires-Python: >=3.11, <3.14
Project-URL: Repository, https://github.com/dfalbel/raghilda
Provides-Extra: chromadb
Provides-Extra: examples
Provides-Extra: sentence-transformers
Provides-Extra: test
Description-Content-Type: text/markdown

# raghilda <img src="assets/raghilda-logo.png" align="right" width="140" alt="raghilda hex logo" />

RAG made simple.

raghilda is a Python package for implementing Retrieval-Augmented Generation (RAG) workflows. It provides a complete solution with sensible defaults while remaining transparent—not a black box.

## Installation

```bash
pip install raghilda
```

Or install from GitHub:

```bash
pip install git+https://github.com/dfalbel/raghilda.git
```

## Key Steps

raghilda handles the complete RAG pipeline:

1. **Document Processing** — Convert documents to Markdown using MarkItDown
2. **Text Chunking** — Split text at semantic boundaries (headings, paragraphs, sentences)
3. **Embedding** — Generate vector representations via OpenAI or other providers
4. **Storage** — Store chunks and embeddings in DuckDB, ChromaDB, or OpenAI Vector Stores
5. **Retrieval** — Find relevant chunks using similarity search or BM25

## Usage

```python
from raghilda.store import DuckDBStore
from raghilda.embedding import EmbeddingOpenAI
from raghilda.scrape import find_links
from raghilda.read import read_as_markdown
from raghilda.chunker import MarkdownChunker

# Create a store with embeddings
store = DuckDBStore.create(
    location="chatlas.db",
    embed=EmbeddingOpenAI(),
)

# Find and index pages from the chatlas documentation
links = find_links("https://posit-dev.github.io/chatlas/")
chunker = MarkdownChunker()

for link in links:
    document = read_as_markdown(link)
    chunked_document = chunker.chunk(document)
    store.upsert(chunked_document)

# Retrieve relevant chunks
chunks = store.retrieve("How do I stream a response?", top_k=5)
for chunk in chunks:
    print(chunk.text)
```

## Links

- [Documentation](https://dfalbel.github.io/raghilda/)
- [Source Code](https://github.com/dfalbel/raghilda)
- [Report Issues](https://github.com/dfalbel/raghilda/issues)
