Metadata-Version: 2.4
Name: ironrace
Version: 0.3.8
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: pytest>=7.0 ; extra == 'dev'
Requires-Dist: numpy>=1.24 ; extra == 'dev'
Requires-Dist: orjson>=3.9 ; extra == 'dev'
Requires-Dist: httpx>=0.25 ; extra == 'dev'
Requires-Dist: tiktoken>=0.5 ; extra == 'dev'
Provides-Extra: dev
License-File: LICENSE
Summary: Rust-powered context engine for AI agent pipelines
Keywords: rust,llm,rag,vector-search,context-engine,pyo3
License: Apache-2.0
Requires-Python: >=3.11
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/ironrace/ironrace
Project-URL: Issues, https://github.com/ironrace/ironrace/issues
Project-URL: Repository, https://github.com/ironrace/ironrace

# IronRace

Rust-powered context engine for AI agent pipelines. Accelerates the CPU-intensive context preparation layer (vector search, JSON processing, tokenization, prompt assembly) while keeping LLM calls in Python where they belong.

## Why

In multi-agent AI systems, context preparation is the CPU bottleneck at scale. LlamaIndex's vector search over 5K embeddings takes ~51ms. IronRace's Rust runtime does it in ~0.4ms with 99% recall — verified against LlamaIndex brute-force on every benchmark run. The full pipeline runs **~14x faster** single-threaded, scaling to **~60x throughput** at 1,000 concurrent pipelines where Python's GIL becomes the bottleneck.

## 30-Second Example

```python
import json
from ironrace import VectorIndex, assemble_prompt, execute_pipeline, compile_agents_dag

# Build a vector index (lives in Rust memory)
index = VectorIndex(embeddings)  # ~6s for 10K docs, 99% recall
results = index.search(query, top_k=10)  # < 2ms for 10K vectors

# Assemble prompts with token budgets
result = assemble_prompt(
    "System: {role}\nContext: {docs}\nQuery: {q}",
    values={"role": "analyst", "docs": long_text, "q": "evaluate this"},
    budgets={"docs": 2000},
)
# result.prompt, result.total_tokens, result.sections_truncated

# Run a full multi-agent pipeline in ONE Rust call
dag = compile_agents_dag([
    {"id": "analyst", "template": "...", "values": {...}, "budgets": {...}},
    {"id": "engineer", "template": "...", "values": {...}, "budgets": {...}},
])
results = json.loads(execute_pipeline(dag))
```

## Benchmark Results

All numbers are **medians** across 200 iterations with 10 warmup runs. Baseline is actual LlamaIndex code (SimpleVectorStore, tiktoken, PromptHelper). Recall verified against LlamaIndex brute-force on every run.

| Operation | LlamaIndex | IronRace (Rust) | Speedup |
|-----------|------------|-----------------|---------|
| Vector search (5K × 384d) | 51ms | 0.4ms | **~123x** |
| JSON parsing (900KB) | ~8ms | ~2ms | **~4x** |
| Token counting (vs tiktoken) | 0.23ms | 0.004ms | **~51x** |
| Prompt assembly | 0.6ms | 0.009ms | **~66x** |
| **Full pipeline** | **61ms** | **4ms** | **~14x** |

Token counting baseline is tiktoken, which is itself Rust-based. JSON baselines are stdlib json, which LlamaIndex uses internally.

Vector search scaling (automatic sharding at >2,500 vectors):

| Scale | Build time | Search (mean) | Recall@10 |
|-------|-----------|---------------|-----------|
| 1K × 768d | 0.6s | 0.57ms | 99% |
| 10K × 768d | 5.7s | 1.77ms | 99% |
| 100K × 768d | 27.7s | 9.57ms | 98% |
| 1M × 384d | 146s | 37.6ms | 98% |

HNSW index build is a one-time cost amortized over all searches. Recall measured against brute-force cosine similarity across 50-100 queries. Datasets larger than 2,500 vectors are automatically sharded into independent HNSW graphs, built and searched in parallel via rayon, maintaining 98%+ recall up to 1M vectors.

Real-world RAG test (19K chunks from a local GitHub repository, extensions: .py .md .txt):

| Operation | LlamaIndex | IronRace | Speedup |
|---|---|---|---|
| Context prep (search + assembly) | 293ms | 1.3ms | **~229x** |
| Result overlap vs LlamaIndex | — | 5/5 identical | — |

Concurrent throughput (GIL released during Rust execution):

| Concurrent Pipelines | Python | IronRace | Speedup |
|---|---|---|---|
| 10 | 31/sec | 1,521/sec | **~49x** |
| 100 | 31/sec | 1,918/sec | **~61x** |
| 1,000 | 31/sec | 1,862/sec | **~60x** |

## Installation

### CLI binary (standalone, no Python needed)

```bash
curl -fsSL https://raw.githubusercontent.com/ironrace/ironrace/main/install.sh | bash
```

Downloads the binary to `~/.ironrace/bin/`, fetches the embedding model, and configures Claude Code's MCP server.

### Python library

```bash
pip install ironrace
```

### Development

```bash
git clone https://github.com/ironrace/ironrace.git
cd ironrace
pip install maturin
maturin develop --release
```

## Key Design Decisions

1. **Python API, Rust runtime** — Decorators, type hints, dataclasses. No Rust knowledge needed.
2. **Bridge once per pipeline** — The entire context prep DAG executes in Rust as a single call.
3. **LLM calls stay in Python** — I/O-bound, asyncio handles them fine.
4. **Vector index lives in Rust** — Built once, searched many times, zero-copy reference.
5. **Token budgeting is first-class** — Per-section budgets with sentence-boundary truncation.
6. **Accuracy by default** — ef_construction=100 gives 98%+ recall out of the box. Tunable for power users.

## Project Structure

```
ironrace/
├── rust/src/               # Rust core — PyO3 extension module
│   ├── vector.rs           # HNSW approximate nearest neighbor (hnsw_rs)
│   ├── tokenizer.rs        # Token counting + truncation
│   ├── assembler.rs        # Prompt assembly + budgeting
│   ├── json_fast.rs        # Serde JSON bridge
│   ├── pipeline.rs         # Tokio DAG executor
│   └── lib.rs              # Module entry point
├── crates/ironrace-bin/    # Standalone CLI binary
│   ├── src/main.rs         # CLI entry point (scan, bench, install, mcp)
│   ├── src/scanner.rs      # Repo scanning + semantic search
│   ├── src/embedder.rs     # ONNX embedding (MiniLM-L6-v2)
│   ├── src/chunker.rs      # Tree-sitter code chunking
│   ├── src/cache.rs        # On-disk embedding cache
│   ├── src/mcp.rs          # MCP server for Claude Code
│   └── benches/vector.rs   # Criterion benchmarks
├── python/ironrace/        # Python SDK
│   ├── decorators.py       # @agent, @context, @pipeline
│   ├── compiler.py         # DAG compilation
│   ├── types.py            # TokenBudget, Document, VectorSearch
│   └── router.py           # Async LLM caller
├── install.sh              # One-line binary installer
├── benchmarks/             # Python benchmarks (with recall verification)
├── examples/               # Runnable demo apps
├── tests/                  # Test suite (125 tests)
└── docs/                   # Documentation
```

## Documentation

- [PROBLEM.md](docs/PROBLEM.md) — The business case and benchmark data
- [ARCHITECTURE.md](docs/ARCHITECTURE.md) — Technical deep dive
- [QUICKSTART.md](docs/QUICKSTART.md) — 5-minute getting started guide
- [BENCHMARKS.md](docs/BENCHMARKS.md) — Reproducible benchmark methodology

## Running

```bash
# Tests
pytest tests/ -v

# Python benchmarks
python -m benchmarks.bench_context_prep
python -m benchmarks.bench_vector_search
python -m benchmarks.bench_at_scale

# Rust benchmarks (Criterion)
cargo bench -p ironrace-bin

# CLI benchmark
ironrace bench --scales 1000,10000,100000

# Semantic code search
ironrace scan --repo . --query "vector search"

# Examples
python examples/startup_evaluator.py
python examples/rag_chatbot.py
python examples/batch_research.py
```

## License

Apache 2.0 — see [LICENSE](LICENSE)

