Metadata-Version: 2.4
Name: codetool-search
Version: 0.4.0
Summary: Fast, dependency-free workspace content and file search for coding-agent tools with Rust backend
Project-URL: Homepage, https://github.com/pbi-agent/codetool-search
Project-URL: Repository, https://github.com/pbi-agent/codetool-search
Project-URL: Issues, https://github.com/pbi-agent/codetool-search/issues
Project-URL: Changelog, https://github.com/pbi-agent/codetool-search/releases
Author-email: drod <naceur.bs@gmail.com>
Maintainer-email: drod <naceur.bs@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: agent,code-search,developer-tools,file-search,filesystem,rust,search,text-search
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Indexing
Classifier: Typing :: Typed
Requires-Python: >=3.12
Description-Content-Type: text/markdown

# codetool-search

`codetool-search` is a workspace search library built for coding-agent harnesses: fast content search, fast filename/path discovery, compact structured results, and predictable token usage.

- **Agent-first API**: one public `search()` call with `target="content"`, `"path"`, or `"both"`.
- **Performance-oriented**: dependency-free Python fallback plus optional Rust CLI acceleration for literal and regex content/path search.
- **Token-compressed output**: compact result keys by default, `result_format="text"` for raw RTK-style text, and `result_format="full"` for the uncompressed backend shape.

```python
from codetool_search import search

content = search("UserService", root=".", mode="files")
paths = search("service", root=".", target="path", glob="*.py")
both = search("UserService", root=".", target="both")
scoped = search("search_workspace", root=["src", "webapp", "tests"], regex=False)
```

Patterns are regexes by default, so alternation works without extra flags:

```python
search("Maximum number of results|Text or regex pattern", root="tests")
```

Pass `regex=False` for exact literal matching.

For maximum token compression, request raw text:

```python
print(search("UserService", root=".", regex=False, result_format="text"))
```

Raw text omits backend/totals metadata, groups repeated path prefixes in a small
tree, crops long snippets/context aggressively, and prints `No Match` for empty
results. It includes a compact pagination header only when another page exists:

```text
-- more: cursor=50
src/
 a.py
```

Raw mode grammar:

- `mode="files"`: matching filenames only.
- `mode="count"`: `path xN`, where `N` is the per-file count.
- `mode="snippets"`: `path:line:text` without context, or tree-grouped files
  where `line:text` marks a match and other indented text is surrounding context.
  With `target="both"`, path-only matches are returned as filename rows.

## API

```python
search(
    pattern,
    root=".",               # path, file, or non-empty list/tuple of paths
    target="content",       # "content", "path", or "both"
    regex=True,             # set False for literal search
    path_scope="path",      # "path" or "basename" for path matching
    glob=None,
    exclude=None,
    case="smart",
    mode="files",          # "files", "snippets", or "count"
    context_lines=0,
    limit=50,
    cursor=None,
    backend="auto",        # "auto", "python", "rust"/"native"
    result_format="compressed",  # "compressed", "text"/"raw", or "full"
)
```

`target="content"` searches file contents. `target="path"` searches relative
file paths without opening file contents. `target="both"` returns files matching
either target and marks each row with its match kind. `mode="snippets"` supports
`target="content"` and `target="both"`; path-only rows under `target="both"`
are returned without line/snippet fields.

`backend="auto"` uses the Rust helper when present, then falls back to pure Python. Regex searches use Rust when supported by its regex engine and fall back to Python for compatibility, including Python `re.finditer` counts for patterns that can match empty spans.

`root` accepts `str | os.PathLike | Sequence[str | os.PathLike]`. It may be a
workspace directory, a single file, or a non-empty list/tuple of directories and
files:

```python
search("search_workspace", root=["src", "webapp", "tests"], regex=False)
```

When calling through JSON/tool schemas, pass multi-root values as a JSON array,
for example `"root": ["src", "webapp", "tests"]`. For resilience with coding
agents, a space-delimited string such as `"root": "src webapp tests"` is also
treated as multiple roots when that exact path does not exist and every split
token is an existing file or directory. Existing paths with spaces still take
priority; quote individual spaced paths if combining them in one string.

File roots search only that file and report paths relative to the file's parent
directory. Multi-root searches report paths relative to the roots' common base,
so sibling roots keep prefixes such as `src/...` and `tests/...`; this also lets
`exclude=["src/generated/**"]` target one root.

Controlled failures raise `SearchError` subclasses:

- `SearchArgumentError` for invalid arguments.
- `SearchPatternError` for invalid/unsupported patterns.
- `SearchRootError` for missing or unsearchable roots.
- `SearchBackendError` for backend runtime failures.

## CLI

```bash
codetool-search "UserService" . --literal --format text
codetool-search "service" . --target path --literal
codetool-search "User(Service|Repository)" --root src --mode snippets --raw
codetool-search "search_workspace" --root src --root webapp --root tests --literal
```

The CLI defaults to compact JSON. Use `--format text` or `--raw` for raw text;
no matches print `No Match`. Repeat `--root` for multiple roots; a single
quoted space-delimited `--root` is accepted as a compatibility fallback when it
can be split into existing roots.

## Install

```bash
uv install codetool-search
```

Wheels can include a platform-specific Rust helper. Without it, the package still works through the Python stdlib backend.

## Benchmarks

Reproduce and refresh the generated README data:

```bash
cargo build --release --manifest-path rust/Cargo.toml
uv run python benchmarks/benchmark_search.py \
  --output reports/search_benchmark.json \
  --update-readme
uv run python benchmarks/benchmark_output_lengths.py \
  --output reports/rtk_vs_codetool_output_lengths.json
uv run python scripts/update_readme_benchmarks.py \
  --performance reports/search_benchmark.json \
  --tokens reports/rtk_vs_codetool_output_lengths.json
```

<!-- benchmark-results:start -->

<!-- Generated by scripts/update_readme_benchmarks.py; do not edit by hand. -->

### Execution performance

Mean of median wall-clock timings across 5 corpora × 7 scenarios, 5 measured rounds after 1 warmup.

| Tool | Mean median time | Chart |
| --- | ---: | --- |
| `codetool-search` | 127.0 ms | ███████████░░░░░░░ |
| `rg` | 138.2 ms | ████████████░░░░░░ |
| `rtk` | 199.7 ms | ██████████████████ |

`codetool-search` is the fastest tool in this run.

Source: `reports/search_benchmark.json`.

### Token compression

Token counts use `tiktoken` when available. The table compares output across 7 RTK-corpus scenarios.

| Output | Tokens | Bytes | Chart |
| --- | ---: | ---: | --- |
| `search(..., result_format="text")` | 11,008 | 34.3 KB | ██░░░░░░░░░░░░░░░░ |
| `rtk grep` stdout | 19,646 | 60.1 KB | ███░░░░░░░░░░░░░░░ |
| default `search(...)` | 38,393 | 125.3 KB | █████░░░░░░░░░░░░░ |
| `search(..., result_format="full")` | 39,027 | 134.7 KB | █████░░░░░░░░░░░░░ |
| `rg` stdout | 129,775 | 402.4 KB | ██████████████████ |

Default structured output is 7.03% smaller than the full structured shape. Raw text omits backend/totals metadata, includes only a cursor hint when truncated, and prints `No Match` for empty pages. Raw text is 0.56× the `rtk grep` token count in this run.

Source: `reports/rtk_vs_codetool_output_lengths.json`.

<!-- benchmark-results:end -->

## Development

```bash
uv run pytest
uv run python scripts/package_rust_binary.py
uv build --wheel
```

Release wheels are built in CI with the staged Rust helper for each target platform.
