Metadata-Version: 2.4
Name: consistent-rag
Version: 0.1.0
Summary: ConsistentRAG: Improving factual consistency in RAG through knowledge graph grounding and multi-agent refinement
Keywords: rag,knowledge-graph,consistency,llm,ppr,agentic
Author: Andres Caceres
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Intended Audience :: Science/Research
License-File: LICENSE
Requires-Dist: loguru
Requires-Dist: numpy
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pydantic-ai>=0.2.0
Requires-Dist: pydantic-graph>=0.1.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: openai>=1.0.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: tenacity>=8.2.0
Requires-Dist: nest-asyncio>=1.5.0
Requires-Dist: inflect>=7.0.0
Requires-Dist: networkx>=3.0
Requires-Dist: faiss-cpu>=1.7.0
Requires-Dist: gliner>=0.2.0
Requires-Dist: qdrant-client>=1.7.0
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: pre-commit>=4.5.1
Requires-Dist: psycopg[binary]>=3.1.0 ; extra == "db"
Requires-Dist: psycopg-pool>=3.1.0 ; extra == "db"
Requires-Dist: consistent-rag[eval, finance, server, db] ; extra == "dev"
Requires-Dist: pytest ; extra == "dev"
Requires-Dist: pytest-asyncio ; extra == "dev"
Requires-Dist: ruff ; extra == "dev"
Requires-Dist: ipython ; extra == "dev"
Requires-Dist: jupyterlab ; extra == "dev"
Requires-Dist: notebook ; extra == "dev"
Requires-Dist: mkdocs ; extra == "dev"
Requires-Dist: python-dotenv ; extra == "dev"
Requires-Dist: datasets>=2.14.0 ; extra == "eval"
Requires-Dist: deepeval ; extra == "eval"
Requires-Dist: pandas ; extra == "eval"
Requires-Dist: scikit-learn ; extra == "eval"
Requires-Dist: matplotlib ; extra == "eval"
Requires-Dist: tqdm ; extra == "eval"
Requires-Dist: stable-baselines3>=2.3.0 ; extra == "finance"
Requires-Dist: gymnasium>=0.29.0 ; extra == "finance"
Requires-Dist: spacy>=3.7.0 ; extra == "finance"
Requires-Dist: fastmcp>=0.1.0 ; extra == "server"
Project-URL: Documentation, https://github.com/andrescaceres/consistent-rag/tree/main/docs
Project-URL: Repository, https://github.com/andrescaceres/consistent-rag
Provides-Extra: db
Provides-Extra: dev
Provides-Extra: eval
Provides-Extra: finance
Provides-Extra: server

# ConsistentRAG

[![Tests](https://github.com/pinkfloydsito/super-consistent-rag/actions/workflows/tests.yml/badge.svg)](https://github.com/pinkfloydsito/super-consistent-rag/actions/workflows/tests.yml)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

**ConsistentRAG: Improving Factual Consistency in RAG via Graph-Based Retrieval and Ranking**

A modular Python framework that improves factual consistency in Retrieval-Augmented Generation by grounding queries in a live knowledge graph. It uses a suite of graph algorithms to retrieve and rank structurally sound reasoning paths, which are then refined through a multi-agent loop where specialized AI "critics" collaboratively improve the answer across multiple iterations.

## Architecture

ConsistentRAG operates as a four-phase pipeline:

```
+-------------------------------------------------------------+
|                     Pipeline Layer                          |
| (AdaptiveHybrid / OnlineDynamic / OfflineKG / Baseline)     |
+-------------------------------------------------------------+
|                   Orchestration Layer                        |
|    (AdaptiveRouter Orchestrator / Multi-Agent Loop)          |
+-------------------------------------------------------------+
|                      Agent Layer                            |
|   Seed Agent | Router Agent | Answer Agent | Critic         |
+-------------------------------------------------------------+
|                  Core Services Layer                        |
|   Retriever | Graph Store | Multi-Algorithm Engine | LLM    |
+-------------------------------------------------------------+
|                  Infrastructure Layer                       |
|   Qdrant | NetworkX | FAISS | DeepSeek API (Default)        |
|                       | OpenAI API (Configurable)           |
+-------------------------------------------------------------+
```

### Four Phases

1. **Dual-Mode Indexing & KG Construction** — Offline (FAISS + pre-built graph) or Online (Qdrant + on-the-fly graph)
2. **Multi-Algorithmic Graph Retrieval Engine** — PPR, Directed Random Walks, Hybrid Semantic Traversal, N-Hops BFS
3. **Adaptive Strategy Routing** — Router Agent dynamically selects the optimal graph algorithm per query
4. **Agentic Iterative Refinement** — Answer + Critic loop with context pruning, structural expansion, and instructional augmentation

## Pipeline Configurations

| Configuration | Type | Vector | KG | Graph Strategy |
|---|---|---|---|---|
| `baseline_no_tool` | Single-pass | No | No | — |
| `baseline_vector` | Single-pass | Yes | No | — |
| `baseline_kg` | Single-pass | No | Yes | PPR |
| `baseline_hybrid` | Single-pass | Yes | Yes | PPR |
| `agentic_vector_only` | Multi-agent | Yes | No | — |
| `agentic_kg_nhops` | Multi-agent | Yes | Yes | N-Hops BFS |
| `agentic_kg_ppr` | Multi-agent | Yes | Yes | PPR |
| `agentic_kg_random_walk` | Multi-agent | Yes | Yes | Random Walk |
| `agentic_kg_hybrid` | Multi-agent | Yes | Yes | Hybrid Semantic |
| `agentic_kg_adaptive` | Multi-agent | Yes | Yes | Adaptive (Full) |

## Operational Modalities

| Modality | Description | Infrastructure |
|---|---|---|
| **1. Static Vector Baseline** | Agentic RAG without graph operations | Qdrant |
| **2. Offline Static KG** | Pre-computed FAISS index + serialized NetworkX graph | None (all local) |
| **3. Online Dynamic KG** | On-the-fly graph construction from Qdrant-retrieved docs | Qdrant |
| **4. Adaptive Hybrid (Full)** | Dynamic strategy routing over static or dynamic graphs | Qdrant (online) or None (offline) |

## Configuration

All parameters are centralized in `ConsistentRAGConfig`:

| Parameter | Default | Description |
|---|---|---|
| `operational_mode` | `online_dynamic` | `offline_static` or `online_dynamic` |
| `graph_strategy` | `adaptive` | `ppr`, `random_walk`, `hybrid`, `nhops`, or `adaptive` |
| `ppr_alpha` | `0.85` | Damping factor for PPR |
| `ppr_iterations` | `20` | Power iteration limit for PPR |
| `min_seed_score` | `0.3` | Minimum confidence for seed entities |
| `max_paths` | `5` | Maximum reasoning paths in prompt |
| `max_improvement_iters` | `3` | Maximum Answer-Critic loops |
| `improvement_threshold` | `0.8` | Minimum critic score to terminate |
| `max_triples_per_doc` | `20` | Extraction limit per document |
| `use_hybrid_seeds` | `True` | Enable semantic + fulltext seed matching |
| `prune_conflicts` | `True` | Enable context pruning on contradictions |
| `llm_model` | `deepseek-chat` | LLM model name |
| `embedding_model` | `text-embedding-ada-002` | Embedding model (`text-embedding-3-small`, `all-MiniLM-L6-v2` for local) |
| `top_k` | `5` | Documents to retrieve |

## Environment Variables

```bash
# LLM API (priority: OPENAI_API_KEY > DEEPSEEK_API_KEY)
OPENAI_API_KEY=sk-...          # OpenAI or any compatible endpoint
OPENAI_API_BASE=https://api.openai.com/v1
DEEPSEEK_API_KEY=sk-...        # Fallback to DeepSeek

# Infrastructure (for online mode)
QDRANT_URL=http://localhost:6333

# Experiment tracking
POSTGRES_URL=postgresql://user:pass@localhost:5432/consistent_rag
```

## Quick Start

```bash
# Install
uv sync

# Start infrastructure (online mode only — Qdrant + PostgreSQL)
make docker-up
```

### Python API

```python
from consistent_rag import ConsistentRAG

# KG-only mode (no external services needed)
rag = ConsistentRAG(mode="kg", strategy="ppr")
result = rag.query("What causes X?", context="X is caused by Y...")
print(result.answer, result.confidence)

# Hybrid mode (Qdrant + KG) — requires `make docker-up`
rag = ConsistentRAG(mode="hybrid", strategy="ppr")
rag.index_documents(["Doc 1...", "Doc 2..."], build_kg=True)
result = rag.query("What is X?")

# FAISS + KG (fully local, no servers)
rag = ConsistentRAG(
    mode="faiss_kg",
    strategy="ppr",
    embedding_model="all-MiniLM-L6-v2",  # local embeddings
)
```

Available modes: `kg`, `vector`, `hybrid`, `faiss`, `faiss_kg`.
Available strategies: `ppr`, `nhops`, `random_walk`, `hybrid`.

See [`examples/`](examples/) for complete runnable scripts.

## Running Evaluations (Thesis Experiments)

The evaluation system runs any combination of the 10 pipeline configurations against the 5 benchmarks. Results are persisted to CSV (and optionally PostgreSQL) with full resume support.

```bash
# Quick smoke test: 1 config, 1 benchmark, 2 samples
uv run python -m consistent_rag.evaluate_all \
    --approaches baseline_no_tool \
    --benchmark faitheval \
    --limit 2 --verbose

# Run a specific config against all benchmarks
uv run python -m consistent_rag.evaluate_all \
    --approaches agentic_kg_ppr \
    --benchmark all \
    --limit 50 \
    --csv results/agentic_kg_ppr.csv

# Run all 10 configs against FaithEval (full thesis Table 2)
uv run python -m consistent_rag.evaluate_all \
    --approaches all \
    --benchmark faitheval \
    --limit 5000 \
    --csv results/faitheval_all.csv \
    --metrics all

# Run all configs against all benchmarks (full thesis experiment matrix)
uv run python -m consistent_rag.evaluate_all \
    --approaches all \
    --benchmark all \
    --csv results/full_matrix.csv \
    --metrics all

# Resume an interrupted run (skips completed samples)
uv run python -m consistent_rag.evaluate_all \
    --approaches all \
    --benchmark all \
    --csv results/full_matrix.csv \
    --resume

# With PostgreSQL persistence for experiment tracking
uv run python -m consistent_rag.evaluate_all \
    --approaches all \
    --benchmark faitheval \
    --csv results/faitheval.csv \
    --postgres postgresql://consistent_rag:consistent_rag@localhost:5432/consistent_rag

# Disable triplet cache (re-extract all seeds/triples)
uv run python -m consistent_rag.evaluate_all \
    --approaches agentic_kg_ppr \
    --benchmark faitheval \
    --no-cache
```

### CLI Options

| Flag | Description |
|---|---|
| `--approaches` | Comma-separated config names or `all` |
| `--benchmark` | Comma-separated benchmark names or `all` |
| `--limit` | Max samples per benchmark (default: 5) |
| `--metrics` | `basic`, `llm`, `deepeval`, or `all` |
| `--csv` | Output CSV path (auto-generated if omitted) |
| `--resume` | Skip completed samples in existing CSV |
| `--no-cache` | Re-extract triplets instead of using cache |
| `--postgres` | PostgreSQL URL for experiment tracking |
| `--verbose` | Print per-sample progress |

### Resuming & Error Recovery

The evaluation system is designed for long-running experiments:
- **CSV append mode**: Results are flushed after each sample — if the process crashes, all completed samples are saved
- **`--resume`**: Reads the CSV and skips `(strategy, sample_id)` pairs that already have results or errors
- **Error rows**: Failed samples are logged with the error message in the CSV, so you can identify and re-run them
- **Triplet cache**: Extracted seeds and triples are cached to `.triplet_cache/` to avoid re-spending LLM tokens across runs

### Build Offline Index (Modality 2)

```bash
uv run python scripts/build_offline_index.py
```

## MCP Tool

ConsistentRAG can be used as an MCP tool by any compatible agent:

```bash
# Start the MCP server
python -m consistent_rag.mcp_server
```

The server exposes a `query` tool with parameters:
- `question` (required): The question to answer
- `context` (optional): Pre-provided context (skips retrieval)
- `mode`: `online_dynamic` or `offline_static`
- `strategy`: `adaptive`, `ppr`, `random_walk`, `hybrid`, `nhops`
- `top_k`: Number of documents to retrieve

## Benchmarks

| Benchmark | Focus | Samples |
|---|---|---|
| **FaithEval** | Factual consistency (unanswerable, inconsistent, counterfactual) | ~15K |
| **MuSiQuE** | Multi-hop reasoning across documents | ~25K |
| **TimeQA** | Temporal reasoning and evolving facts | ~20K |
| **SQuAD** | Single-hop extractive QA (control baseline) | ~100K |
| **FinanceBench** | Domain-specific financial document QA (SEC filings) | 150 |

## Development

```bash
make lint          # Format check + linting
make format        # Auto-fix formatting
make test-unit     # Run unit tests
make test          # Run all tests
make ci            # Full CI pipeline
```

## Project Structure

```
consistent_rag/
├── api.py                       # Public API (ConsistentRAG, QueryResult)
├── config.py                    # ConsistentRAGConfig (unified)
├── pipeline_factory.py          # Pipeline configuration factory
├── llm.py                       # Universal LLM client (OpenAI-compatible)
├── embeddings.py                # Embedding service (local + API)
├── retriever.py                 # Qdrant vector retriever
├── offline_retriever.py         # FAISS offline retriever
├── experiment_store.py          # PostgreSQL experiment persistence
├── experiment_timeout.py        # Per-sample timeout tracking
├── triplet_cache.py             # Triplet cache for experiment resumption
├── mcp_server.py                # MCP tool server
├── knowledge_graph/
│   ├── networkx_graph_store.py  # NetworkX graph backend
│   ├── algorithms.py            # PPR, Random Walk, Hybrid, N-Hops
│   ├── extractor.py             # LLM-based triple extraction
│   └── reasoning.py             # Reasoning path extraction
├── agentic/
│   ├── context_refinement.py    # Algorithm 2: prune/expand/augment
│   ├── basic/                   # Basic agentic orchestrator
│   ├── kg_ppr/                  # KG+PPR agentic orchestrator
│   └── common/prompts.py        # Prompt templates
├── benchmarks/                  # Dataset loaders (FaithEval, MuSiQuE, etc.)
├── evaluation/                  # Metrics (EM, F1, faithfulness)
└── domain/finance/              # Financial domain extensions
```

