Metadata-Version: 2.4
Name: local-mgrep
Version: 0.4.0
Summary: Free local semantic code search using Ollama
License-Expression: MIT
Project-URL: Homepage, https://github.com/danielchen26/local-mgrep
Project-URL: Repository, https://github.com/danielchen26/local-mgrep
Project-URL: Issues, https://github.com/danielchen26/local-mgrep/issues
Project-URL: Documentation, https://danielchen26.github.io/local-mgrep/
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0
Requires-Dist: requests>=2.31
Requires-Dist: numpy>=1.26
Requires-Dist: scikit-learn>=1.4
Requires-Dist: tree-sitter>=0.21
Requires-Dist: tree-sitter-python>=0.23
Requires-Dist: tree-sitter-javascript>=0.23
Requires-Dist: tree-sitter-typescript>=0.23
Provides-Extra: rerank
Requires-Dist: sentence-transformers>=3.0; extra == "rerank"
Dynamic: license-file

<p align="center">
  <img alt="local-mgrep — semantic code search over a local index" src="docs/assets/hero-dark.svg" width="100%">
</p>

<p align="center">
  <a href="https://pypi.org/project/local-mgrep/"><img src="https://img.shields.io/pypi/v/local-mgrep?label=pypi&color=22d3ee&labelColor=0a0d12" alt="PyPI"></a>
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.9%2B-22d3ee?labelColor=0a0d12" alt="Python 3.9+"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-22d3ee?labelColor=0a0d12" alt="MIT License"></a>
  <a href="https://danielchen26.github.io/local-mgrep/"><img src="https://img.shields.io/badge/docs-published-22d3ee?labelColor=0a0d12" alt="Documentation"></a>
  <a href="https://github.com/danielchen26/local-mgrep/releases/latest"><img src="https://img.shields.io/github/v/release/danielchen26/local-mgrep?label=release&color=22d3ee&labelColor=0a0d12" alt="Latest release"></a>
</p>

<p align="center">
  <a href="https://danielchen26.github.io/local-mgrep/"><b>Documentation</b></a>
  &nbsp;·&nbsp;
  <a href="#quickstart"><b>Quickstart</b></a>
  &nbsp;·&nbsp;
  <a href="#architecture"><b>Architecture</b></a>
  &nbsp;·&nbsp;
  <a href="#benchmark"><b>Benchmark</b></a>
  &nbsp;·&nbsp;
  <a href="https://github.com/danielchen26/local-mgrep/releases"><b>Releases</b></a>
  &nbsp;·&nbsp;
  <a href="https://github.com/danielchen26/local-mgrep/issues"><b>Issues</b></a>
</p>

---

## Overview

`local-mgrep` is an offline semantic code-search CLI built around a four-stage
retrieval pipeline:

1. **Lexical prefilter (ripgrep).** Up to eight literal tokens are extracted
   from the natural-language query and ripgrep narrows the corpus to files
   containing any of them. Typically reduces the chunk scan ~100×.
2. **Multi-resolution cosine.** A file-level embedding (mean of chunk
   vectors, computed at index time) picks the top files; chunk-level cosine
   then runs only inside those files.
3. **Confidence-gated cascade (default).** When the cheap file-mean
   retrieval has a clear top-1, we return immediately. Only uncertain
   queries pay the LLM-driven escalation (cosine + file-rank ∪
   HyDE-rewritten cosine + file-rank, score-preserved union). Pass
   `--no-cascade` to fall back to the chunk-only path.
4. **Optional cross-encoder rerank.** On the non-cascade path, an
   `mxbai-rerank-*-v2` cross-encoder reorders the top candidate pool;
   non-canonical paths (tests, blocklists) get a multiplicative penalty.

A first-time query against a project triggers a one-time auto-index;
subsequent queries do an mtime-based incremental refresh (throttled, so
back-to-back queries don't pay it). Indexing, retrieval, optional answer
synthesis (`--answer`), and optional agentic decomposition (`--agentic`)
all run on the local host against a local Ollama server. No remote service
is required for the core workflow.

Each result carries the file path, an inclusive 1-based line range, the
detected language, the score, and the verbatim source text — rendered as
text, JSON (`--json`), or as a synthesized answer over the local Ollama
generation model.

Latest stable release notes: [v0.4.0](https://github.com/danielchen26/local-mgrep/releases/latest).

## Quickstart

```bash
pip install local-mgrep

# Ask. The first query auto-indexes; subsequent queries refresh on mtime change.
mgrep "where is the auth token refreshed?"
```

That is the entire happy path. `mgrep` derives the project root from `git
rev-parse --show-toplevel` (falling back to the current working directory),
maintains a per-project index under `~/.local-mgrep/repos/`, and runs the
confidence-gated cascade by default for retrieval. If Ollama is not yet
installed or the embedding model is missing, the CLI prints an actionable
one-line setup hint instead of failing silently.

Health-check the setup at any time:

```bash
mgrep doctor
```

Common follow-ups:

```bash
mgrep "where is the SQLite schema initialized?" --json    # machine-readable
mgrep "how does indexing remove deleted files?" --answer  # local Ollama answer
mgrep "..." --agentic --max-subqueries 3 --answer         # bounded subquery decomposition
mgrep "..." --no-cascade --rerank                         # legacy chunk + rerank path
mgrep stats                                               # current project's index info
mgrep index .                                             # explicit reindex (rarely needed)
mgrep serve &                                             # warm-reranker daemon
mgrep "..." --daemon-url http://127.0.0.1:7878            # query against the daemon
```

Full CLI reference and configuration: <https://danielchen26.github.io/local-mgrep/>.

## Architecture

<p align="center">
  <img alt="local-mgrep system architecture: three lanes for index time, storage, and query time" src="docs/assets/architecture.svg" width="100%">
</p>

The index pipeline (`mgrep index`, `mgrep watch`) and the query pipeline
(`mgrep search`) communicate only through a SQLite database stored at
`$MGREP_DB_PATH`. The two pipelines share no in-process state and can run
on different hosts as long as they point at the same database file. The
index pipeline additionally populates a `files` table at index time
(L2-normalised mean of chunk vectors per file) so the query path can do
file-level retrieval without re-scanning chunks.

Three retrieval tiers are exposed:

| Tier | Command | Recall (warp 16) | Avg s/q (Mac CPU) | Notes |
| --- | --- | :-: | :-: | --- |
| cascade (default) | `mgrep "<query>"` | 14/16 | 1.49 | confidence-gated; cheap path on ~81% of queries, HyDE-union escalation on the rest |
| chunk + rerank | `mgrep "<query>" --no-cascade --rerank` | 11/16 | 9.5 | adds `mxbai-rerank-base-v2` cross-encoder |
| chunk-only | `mgrep "<query>" --no-cascade --no-rerank` | 9/16 | 0.52 | rg prefilter + cosine + file-rank only |

The full set of internal modules is documented at
<https://danielchen26.github.io/local-mgrep/#architecture>.

## Capability matrix

| Capability | Status | Since | Notes |
| --- | --- | --- | --- |
| Semantic code search via local Ollama | implemented | 0.1.0 | No remote service required. |
| Tree-sitter chunking | implemented | 0.2.0 | Languages with installed grammars; line-window fallback otherwise. |
| `.gitignore` / `.mgrepignore` hygiene | implemented | 0.2.0 | Plus a built-in skip-set for common build/cache directories. |
| Incremental indexing | implemented | 0.2.0 | mtime-based; reindexes new and changed files. |
| Stale row cleanup | implemented | 0.2.0 | Removes rows for files no longer present beneath the indexed root. |
| Watch mode | implemented | 0.2.0 | Polling loop; default interval 5 seconds. |
| Hybrid lexical + semantic ranking | implemented | 0.2.0 | Disable with `--semantic-only` for pure cosine. |
| Stable JSON output | implemented | 0.2.0 | See [JSON schema](https://danielchen26.github.io/local-mgrep/#json-schema). |
| Local answer mode | implemented | 0.2.0 | Uses local Ollama generation model. |
| Local agentic decomposition | implemented | 0.2.0 | Bounded subquery expansion via Ollama. |
| Cross-encoder rerank | implemented | 0.3.0 | `--rerank` (default on); models: `mxbai-rerank-base-v2` (default), `mxbai-rerank-large-v2` (`MGREP_RERANK_MODEL`). |
| Asymmetric query/document prefixes | implemented | 0.3.0 | Auto-applied for `nomic-embed-text` and similar. |
| HyDE query rewriting | implemented | 0.3.0 | `--hyde`; deterministic seed, single Ollama generation. |
| Multi-resolution retrieval | implemented | 0.3.0 | File-level cosine top-N → chunk-level inside those files (default on). |
| Lexical prefilter (ripgrep first stage) | implemented | 0.3.0 | `--lexical-prefilter` (default on); `--lexical-root`, `--lexical-min-candidates`. |
| File-rank (one chunk per file) | implemented | 0.3.0 | `--rank-by file`; collapses results so small canonical files compete fairly. |
| Daemon mode (warm reranker) | implemented | 0.3.0 | `mgrep serve` + `--daemon-url`; eliminates ~5–10 s cold-load per call. |
| Quantisation / device knobs | implemented | 0.3.0 | `MGREP_RERANK_QUANTIZE=int8`, `MGREP_RERANK_DEVICE=auto/mps/cpu`. |
| Confidence-gated cascade | implemented | 0.3.0 (default in 0.4.0) | Default on; gates expensive escalation on a top-1 / top-2 score gap. `--no-cascade` to disable. See [Benchmark](#benchmark). |
| Bare-form invocation (`mgrep "<q>"`) | implemented | 0.4.0 | Routes to `search` automatically; subcommand names still win for `index/watch/serve/stats/doctor`. |
| Per-project auto-index | implemented | 0.4.0 | First query in a project triggers one-time index; subsequent queries do mtime-based incremental refresh (30 s throttle). Set `MGREP_DB_PATH` to opt out. |
| Bootstrap probes (`mgrep doctor`) | implemented | 0.4.0 | Health check of Ollama runtime, embed/LLM model presence, project index state, optional reranker. |
| Hosted account / cloud index | out of scope | — | Not planned. |
| Paid web search | out of scope | — | Not planned. |

## Benchmark

Two reproducible benchmarks ship in this repository.

### 1. warp 16-task cross-repo benchmark (the headline)

Measured on Mac CPU, no daemon, cold reranker amortised over 16 tasks. The
test set is a held-out collection of natural-language queries against the
`warp-terminal/warp` Rust workspace; the canonical answers are subdirectory
paths (e.g. `crates/voice_input/`). Recall counts a query as a hit when at
least one returned chunk lives under the canonical subdirectory.

| Tier | Command | Recall | Avg s/q |
| --- | --- | :-: | :-: |
| ultra-fast | `mgrep "..." --cascade-tau 0.0` | 11/16 | 0.10 |
| chunk-only | `mgrep "..." --no-cascade --no-rerank` | 9/16 | 0.52 |
| chunk + rerank | `mgrep "..." --no-cascade --rerank` | 11/16 | 9.5 |
| cascade (default) | `mgrep "..."` | 14/16 | 1.49 |
| ripgrep raw recall | `rg -il -F` token-OR | 16/16 | 0.1 |

At the default τ=0.015 the cascade's cheap file-mean path handles ~81% of
queries; the remaining ~19% pay the HyDE-union escalation. Two queries
(`crates/ai/`, `app/src/billing/`) currently miss across every tested
configuration — hard semantic-disambiguation cases discussed in
[`docs/roadmap.md`](docs/roadmap.md). Full τ sweep, methodology, and the
ablation tables (including the abandoned LLM-arbitration, code-graph, and
multi-HyDE experiments) live in
[`docs/parity-benchmarks.md`](docs/parity-benchmarks.md).

### 2. local-mgrep self-test (regression guard)

<p align="center">
  <img alt="Bar chart: token reduction and recall vs grep-agent across top-k 5, 10, 20, 50" src="docs/assets/benchmark.svg" width="100%">
</p>

Deterministic local benchmark over 30 repository-navigation tasks. Compares
a single `mgrep search` call against a grep-agent simulation. Token volumes
are estimated as `chars / 4`.

| top-k | recall (mgrep) | recall (grep) | total-token reduction | context-token reduction |
| ----: | :------------- | :------------ | :-------------------- | :---------------------- |
| 5     | 28 / 30        | 30 / 30       | 2.66×                 | 5.53×                   |
| **10** | **30 / 30**    | **30 / 30**   | **2.00×**             | **2.90×**               |
| 20    | 30 / 30        | 30 / 30       | 1.36×                 | 1.53×                   |
| 50    | 30 / 30        | 30 / 30       | 0.67×                 | 0.60×                   |

This is the regression guard — every release is verified to keep 30/30 at
top-k 10. Full methodology and limitations are in
[`docs/token-benchmarking.md`](docs/token-benchmarking.md).

## CLI reference

```bash
mgrep        QUERY  [OPTIONS]                            # bare-form search (default)
mgrep search QUERY  [OPTIONS]                            # explicit search
mgrep doctor                                             # runtime + model + index health check
mgrep stats                                              # print chunk and file counts
mgrep index  [PATH] [--reset] [--incremental/--full]    # explicit reindex (auto on first query)
mgrep watch  [PATH] --interval N                        # poll for changes (default 5s)
mgrep serve  [--host H] [--port P]                      # run daemon (keeps reranker warm)
```

<details>
<summary><b><code>mgrep search</code> options</b></summary>

<br>

| Option | Default | Effect |
| --- | --- | --- |
| `-m`, `-n`, `--top` | 5 | Number of final results. |
| `--json` | off | Emit a JSON array; suppresses human formatting. |
| `--answer` | off | Synthesize an answer from retrieved snippets via Ollama. |
| `--content / --no-content` | on | Show or hide snippet bodies in human output. |
| `--language` | — | Restrict to one or more language keys; repeatable. |
| `--include` | — | Glob; only paths matching at least one pattern are kept; repeatable. |
| `--exclude` | — | Glob; paths matching any pattern are dropped; repeatable. |
| `--semantic-only` | off | Skip lexical reranking; rank by cosine alone. |
| `--rerank / --no-rerank` | on | Apply cross-encoder rerank when `sentence-transformers` is installed. |
| `--rerank-pool` | 50 | Candidate pool size before reranking (env `MGREP_RERANK_POOL`). |
| `--rerank-model` | env or default | HuggingFace cross-encoder id (env `MGREP_RERANK_MODEL`). |
| `--hyde / --no-hyde` | off | Generate a hypothetical-document via local Ollama LLM and embed both the query and the doc. |
| `--multi-resolution / --no-multi-resolution` | on | Two-stage retrieval: file-level cosine top-N, then chunk-level inside those. |
| `--file-top` | 30 | Number of files surfaced by file-level retrieval before chunk-level scoring. |
| `--lexical-prefilter / --no-lexical-prefilter` | on | Use ripgrep to narrow the candidate file set before cosine + rerank. |
| `--lexical-root` | cwd | Root directory ripgrep scans for the lexical prefilter. |
| `--lexical-min-candidates` | 2 | Fall back to corpus-wide cosine when ripgrep returns fewer files. |
| `--rank-by` | `chunk` | `chunk` (per-file diversity cap) or `file` (one best chunk per file, sorted by score). |
| `--cascade / --no-cascade` | on | Confidence-gated retrieval (cheap path + HyDE-union escalation). On by default since 0.4.0. |
| `--cascade-tau` | 0.015 | Confidence threshold (top-1 minus top-2 file-mean cosine). |
| `--auto-index / --no-auto-index` | on | Auto-build the index on first query and refresh on mtime change. Off when `MGREP_DB_PATH` is set externally. |
| `--daemon-url` | — | Send the search to a running `mgrep serve` daemon (warm reranker). |
| `--agentic` | off | Decompose the query into subqueries via Ollama before search. |
| `--max-subqueries` | 3 | Upper bound on agentic subqueries. |

</details>

## Configuration

| Variable | Default | Effect |
| --- | --- | --- |
| `OLLAMA_URL` | `http://localhost:11434` | Base URL of the Ollama server. |
| `OLLAMA_EMBED_MODEL` | `mxbai-embed-large` | Embedding model used at index time and query time. `nomic-embed-text` is the recommended alternative for code. |
| `OLLAMA_LLM_MODEL` | `qwen2.5:3b` | Generation model used by `--answer`, `--agentic`, `--hyde`, and the cascade escalation. |
| `MGREP_DB_PATH` | `~/.local-mgrep/index.db` | SQLite index location. |
| `MGREP_RERANK_MODEL` | `mixedbread-ai/mxbai-rerank-base-v2` | Cross-encoder model id. Try `mxbai-rerank-large-v2` for +1 recall at ~2× latency. |
| `MGREP_RERANK_POOL` | 50 | Candidate pool size before reranking. |
| `MGREP_RERANK_QUANTIZE` | — | Set `int8` for x86_64 CPUs with VNNI; no benefit on Apple Silicon. |
| `MGREP_RERANK_DEVICE` | `auto` | `auto`, `mps`, or `cpu`. |

> **Note.** Switching `OLLAMA_EMBED_MODEL` after indexing produces a
> dimension or semantic mismatch. Reindex with `--reset` after changing the
> embedding model.

## Releases

Every released version has a comprehensive entry on the
[GitHub Releases page](https://github.com/danielchen26/local-mgrep/releases)
covering architecture changes, benchmark deltas, and compatibility notes.
Each release is also published to PyPI.

## Development

```bash
git clone https://github.com/danielchen26/local-mgrep.git
cd local-mgrep
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[rerank]"

.venv/bin/pytest -q tests/
.venv/bin/python benchmarks/agent_context_benchmark.py --top-k 10 --summary-only
```

To reproduce a warp benchmark row, follow the indexing/run instructions
in [`docs/parity-benchmarks.md`](docs/parity-benchmarks.md).

## License

MIT — see [`LICENSE`](LICENSE).

## Acknowledgments

- [Ollama](https://ollama.com/) for the local embedding and generation runtime.
- [tree-sitter](https://tree-sitter.github.io/tree-sitter/) for syntax-aware parsing.
- [ripgrep](https://github.com/BurntSushi/ripgrep) for the lexical prefilter stage.
- [Mixedbread](https://www.mixedbread.com/) for the open-source `mxbai-rerank-*-v2`
  cross-encoder family used by `--rerank`.
- Click, NumPy, and SQLite for the core runtime dependencies.
