Metadata-Version: 2.4
Name: codetool-search
Version: 0.2.0
Summary: Fast, dependency-free workspace content and file search for coding-agent tools with Rust backend
Project-URL: Homepage, https://github.com/pbi-agent/codetool-search
Project-URL: Repository, https://github.com/pbi-agent/codetool-search
Project-URL: Issues, https://github.com/pbi-agent/codetool-search/issues
Project-URL: Changelog, https://github.com/pbi-agent/codetool-search/releases
Author-email: drod <naceur.bs@gmail.com>
Maintainer-email: drod <naceur.bs@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: agent,code-search,developer-tools,file-search,filesystem,rust,search,text-search
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Indexing
Classifier: Typing :: Typed
Requires-Python: >=3.12
Description-Content-Type: text/markdown

# codetool-search

`codetool-search` is a workspace search library built for coding-agent harnesses: fast content search, fast filename/path discovery, compact structured results, and predictable token usage.

- **Agent-first API**: one public `search()` call with `target="content"`, `"path"`, or `"both"`.
- **Performance-oriented**: dependency-free Python fallback plus optional Rust CLI acceleration for literal and regex content/path search.
- **Token-compressed output**: compact result keys by default, `result_format="text"` for raw RTK-style text, and `result_format="full"` for the uncompressed backend shape.

```python
from codetool_search import search

content = search("UserService", root=".", mode="files")
paths = search("service", root=".", target="path", glob="*.py")
both = search("UserService", root=".", target="both")
```

Patterns are regexes by default, so alternation works without extra flags:

```python
search("Maximum number of results|Text or regex pattern", root="tests")
```

Pass `regex=False` for exact literal matching.

For maximum token compression, request raw text:

```python
print(search("UserService", root=".", regex=False, result_format="text"))
```

Raw text omits backend/totals metadata, groups repeated path prefixes in a small
tree, crops long snippets/context aggressively, and prints `No Match` for empty
results. It includes a compact pagination header only when another page exists:

```text
-- more: cursor=50
src/
 a.py
```

Raw mode grammar:

- `mode="files"`: matching filenames only.
- `mode="count"`: `path xN`, where `N` is the per-file count.
- `mode="snippets"`: `path:line:text` without context, or tree-grouped files
  where `line:text` marks a match and `~text` marks surrounding context.

## API

```python
search(
    pattern,
    root=".",
    target="content",       # "content", "path", or "both"
    regex=True,             # set False for literal search
    path_scope="path",      # "path" or "basename" for path matching
    glob=None,
    exclude=None,
    case="smart",
    mode="files",          # "files", "snippets", or "count"
    context_lines=0,
    limit=50,
    cursor=None,
    backend="auto",        # "auto", "python", "rust"/"native"
    result_format="compressed",  # "compressed", "text"/"raw", or "full"
)
```

`target="content"` searches file contents. `target="path"` searches relative
file paths without opening file contents. `target="both"` returns files matching
either target and marks each row with its match kind.

`backend="auto"` uses the Rust helper when present, then falls back to pure Python. Regex searches use Rust when supported by its regex engine and fall back to Python for compatibility, including Python `re.finditer` counts for patterns that can match empty spans.

`root` may be a workspace directory or a single file. File roots search only that file and report paths relative to the file's parent directory.

Controlled failures raise `SearchError` subclasses:

- `SearchArgumentError` for invalid arguments.
- `SearchPatternError` for invalid/unsupported patterns.
- `SearchRootError` for missing or unsearchable roots.
- `SearchBackendError` for backend runtime failures.

## CLI

```bash
codetool-search "UserService" . --literal --format text
codetool-search "service" . --target path --literal
codetool-search "User(Service|Repository)" --root src --mode snippets --raw
```

The CLI defaults to compact JSON. Use `--format text` or `--raw` for raw text;
no matches print `No Match`.

## Install

```bash
uv install codetool-search
```

Wheels can include a platform-specific Rust helper. Without it, the package still works through the Python stdlib backend.

## Benchmarks

Reproduce and refresh the generated README data:

```bash
cargo build --release --manifest-path rust/Cargo.toml
uv run python benchmarks/benchmark_search.py \
  --output reports/search_benchmark.json \
  --update-readme
uv run python benchmarks/benchmark_output_lengths.py \
  --output reports/rtk_vs_codetool_output_lengths.json
uv run python scripts/update_readme_benchmarks.py \
  --performance reports/search_benchmark.json \
  --tokens reports/rtk_vs_codetool_output_lengths.json
```

<!-- benchmark-results:start -->

<!-- Generated by scripts/update_readme_benchmarks.py; do not edit by hand. -->

### Execution performance

Mean of median wall-clock timings across 5 corpora × 7 scenarios, 5 measured rounds after 1 warmup.

| Tool | Mean median time | Chart |
| --- | ---: | --- |
| `codetool-search` | 127.0 ms | ███████████░░░░░░░ |
| `rg` | 138.2 ms | ████████████░░░░░░ |
| `rtk` | 199.7 ms | ██████████████████ |

`codetool-search` is the fastest tool in this run.

Source: `reports/search_benchmark.json`.

### Token compression

Token counts use `tiktoken` when available. The table compares output across 7 RTK-corpus scenarios.

| Output | Tokens | Bytes | Chart |
| --- | ---: | ---: | --- |
| `search(..., result_format="text")` | 11,008 | 34.3 KB | ██░░░░░░░░░░░░░░░░ |
| `rtk grep` stdout | 19,646 | 60.1 KB | ███░░░░░░░░░░░░░░░ |
| default `search(...)` | 38,393 | 125.3 KB | █████░░░░░░░░░░░░░ |
| `search(..., result_format="full")` | 39,027 | 134.7 KB | █████░░░░░░░░░░░░░ |
| `rg` stdout | 129,775 | 402.4 KB | ██████████████████ |

Default structured output is 7.03% smaller than the full structured shape. Raw text omits backend/totals metadata, includes only a cursor hint when truncated, and prints `No Match` for empty pages. Raw text is 0.56× the `rtk grep` token count in this run.

Source: `reports/rtk_vs_codetool_output_lengths.json`.

<!-- benchmark-results:end -->

## Development

```bash
uv run pytest
uv run python scripts/package_rust_binary.py
uv build --wheel
```

Release wheels are built in CI with the staged Rust helper for each target platform.
