Metadata-Version: 2.4
Name: ironrace
Version: 0.2.3
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: pytest>=7.0 ; extra == 'dev'
Requires-Dist: numpy>=1.24 ; extra == 'dev'
Requires-Dist: orjson>=3.9 ; extra == 'dev'
Requires-Dist: httpx>=0.25 ; extra == 'dev'
Requires-Dist: tiktoken>=0.5 ; extra == 'dev'
Provides-Extra: dev
License-File: LICENSE
Summary: Rust-powered context engine for AI agent pipelines
Keywords: rust,llm,rag,vector-search,context-engine,pyo3
License: Apache-2.0
Requires-Python: >=3.11
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/ironrace/ironrace
Project-URL: Issues, https://github.com/ironrace/ironrace/issues
Project-URL: Repository, https://github.com/ironrace/ironrace

# IronRace

Rust-powered context engine for AI agent pipelines. Accelerates the CPU-intensive context preparation layer (vector search, JSON processing, tokenization, prompt assembly) while keeping LLM calls in Python where they belong.

## Why

In multi-agent AI systems, context preparation is the CPU bottleneck at scale. LlamaIndex's vector search over 5K embeddings takes ~51ms. IronRace's Rust runtime does it in ~0.4ms with 99% recall — verified against LlamaIndex brute-force on every benchmark run. The full pipeline runs **~14x faster** single-threaded, scaling to **~60x throughput** at 1,000 concurrent pipelines where Python's GIL becomes the bottleneck.

## 30-Second Example

```python
import json
from ironrace import VectorIndex, assemble_prompt, execute_pipeline, compile_agents_dag

# Build a vector index (lives in Rust memory)
index = VectorIndex(embeddings)  # ~6s for 10K docs, 99% recall
results = index.search(query, top_k=10)  # < 2ms for 10K vectors

# Assemble prompts with token budgets
result = assemble_prompt(
    "System: {role}\nContext: {docs}\nQuery: {q}",
    values={"role": "analyst", "docs": long_text, "q": "evaluate this"},
    budgets={"docs": 2000},
)
# result.prompt, result.total_tokens, result.sections_truncated

# Run a full multi-agent pipeline in ONE Rust call
dag = compile_agents_dag([
    {"id": "analyst", "template": "...", "values": {...}, "budgets": {...}},
    {"id": "engineer", "template": "...", "values": {...}, "budgets": {...}},
])
results = json.loads(execute_pipeline(dag))
```

## Benchmark Results

All numbers are **medians** across 200 iterations with 10 warmup runs. Baseline is actual LlamaIndex code (SimpleVectorStore, tiktoken, PromptHelper). Recall verified against LlamaIndex brute-force on every run.

| Operation | LlamaIndex | IronRace (Rust) | Speedup |
|-----------|------------|-----------------|---------|
| Vector search (5K × 384d) | 51ms | 0.4ms | **~123x** |
| JSON parsing (900KB) | ~8ms | ~2ms | **~4x** |
| Token counting (vs tiktoken) | 0.23ms | 0.004ms | **~51x** |
| Prompt assembly | 0.6ms | 0.009ms | **~66x** |
| **Full pipeline** | **61ms** | **4ms** | **~14x** |

Token counting baseline is tiktoken, which is itself Rust-based. JSON baselines are stdlib json, which LlamaIndex uses internally.

Vector search scaling (automatic sharding at >2,500 vectors):

| Scale | Build time | Search (mean) | Recall@10 |
|-------|-----------|---------------|-----------|
| 1K × 768d | 0.6s | 0.57ms | 99% |
| 10K × 768d | 5.7s | 1.77ms | 99% |
| 100K × 768d | 27.7s | 9.57ms | 98% |
| 1M × 384d | 146s | 37.6ms | 98% |

HNSW index build is a one-time cost amortized over all searches. Recall measured against brute-force cosine similarity across 50-100 queries. Datasets larger than 2,500 vectors are automatically sharded into independent HNSW graphs, built and searched in parallel via rayon, maintaining 98%+ recall up to 1M vectors.

Real-world RAG test (19K chunks from a local GitHub repository, extensions: .py .md .txt):

| Operation | LlamaIndex | IronRace | Speedup |
|---|---|---|---|
| Context prep (search + assembly) | 293ms | 1.3ms | **~229x** |
| Result overlap vs LlamaIndex | — | 5/5 identical | — |

Concurrent throughput (GIL released during Rust execution):

| Concurrent Pipelines | Python | IronRace | Speedup |
|---|---|---|---|
| 10 | 31/sec | 1,521/sec | **~49x** |
| 100 | 31/sec | 1,918/sec | **~61x** |
| 1,000 | 31/sec | 1,862/sec | **~60x** |

## Installation

```bash
pip install ironrace
```

Development:
```bash
git clone https://github.com/ironrace/ironrace.git
cd ironrace
pip install maturin
maturin develop --release
```

## Key Design Decisions

1. **Python API, Rust runtime** — Decorators, type hints, dataclasses. No Rust knowledge needed.
2. **Bridge once per pipeline** — The entire context prep DAG executes in Rust as a single call.
3. **LLM calls stay in Python** — I/O-bound, asyncio handles them fine.
4. **Vector index lives in Rust** — Built once, searched many times, zero-copy reference.
5. **Token budgeting is first-class** — Per-section budgets with sentence-boundary truncation.
6. **Accuracy by default** — ef_construction=100 gives 98%+ recall out of the box. Tunable for power users.

## Project Structure

```
ironrace/
├── rust/src/           # Rust core (PyO3)
│   ├── vector.rs       # HNSW approximate nearest neighbor (hnsw_rs)
│   ├── tokenizer.rs    # Token counting + truncation
│   ├── assembler.rs    # Prompt assembly + budgeting
│   ├── json_fast.rs    # Serde JSON bridge
│   ├── pipeline.rs     # Tokio DAG executor
│   └── lib.rs          # Module entry point
├── python/ironrace/  # Python SDK
│   ├── decorators.py   # @agent, @context, @pipeline
│   ├── compiler.py     # DAG compilation
│   ├── types.py        # TokenBudget, Document, VectorSearch
│   └── router.py       # Async LLM caller
├── benchmarks/         # Performance benchmarks (with recall verification)
├── examples/           # Runnable demo apps
├── tests/              # Test suite (91 tests)
└── docs/               # Documentation
```

## Documentation

- [PROBLEM.md](docs/PROBLEM.md) — The business case and benchmark data
- [ARCHITECTURE.md](docs/ARCHITECTURE.md) — Technical deep dive
- [QUICKSTART.md](docs/QUICKSTART.md) — 5-minute getting started guide
- [BENCHMARKS.md](docs/BENCHMARKS.md) — Reproducible benchmark methodology

## Running

```bash
# Tests
pytest tests/ -v

# Benchmarks
python -m benchmarks.bench_context_prep
python -m benchmarks.bench_vector_search
python -m benchmarks.bench_at_scale

# Examples
python examples/startup_evaluator.py
python examples/rag_chatbot.py
python examples/batch_research.py
```

## License

Apache 2.0 — see [LICENSE](LICENSE)

