Metadata-Version: 2.4
Name: jcodemunch-mcp
Version: 0.2.1
Summary: Token-efficient MCP server for source code exploration via tree-sitter AST parsing
Requires-Python: >=3.10
Requires-Dist: anthropic>=0.40.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: pathspec>=0.12.0
Requires-Dist: tree-sitter-language-pack>=0.7.0
Description-Content-Type: text/markdown

## Cut code-reading token costs by up to **99%**

Most AI agents explore repositories the expensive way:
open entire files → skim thousands of irrelevant lines → repeat.

**jCodeMunch indexes a codebase once and lets agents retrieve only the exact symbols they need** — functions, classes, methods, constants — with byte-level precision.

| Task                   | Traditional approach | With jCodeMunch |
| ---------------------- | -------------------- | --------------- |
| Find a function        | ~40,000 tokens       | ~200 tokens     |
| Understand module API  | ~15,000 tokens       | ~800 tokens     |
| Explore repo structure | ~200,000 tokens      | ~2k tokens      |

Index once. Query cheaply forever.
Precision context beats brute-force context.

---

# jCodeMunch MCP

### Make AI agents cheaper and faster on real codebases

![License](https://img.shields.io/badge/license-MIT-blue)
![MCP](https://img.shields.io/badge/MCP-compatible-purple)
![Local-first](https://img.shields.io/badge/local--first-yes-brightgreen)
![Polyglot](https://img.shields.io/badge/parsing-tree--sitter-9cf)

**Stop dumping files into context windows. Start retrieving exactly what the agent needs.**

jCodeMunch indexes a codebase once using tree-sitter AST parsing, then lets MCP-compatible agents (Claude Desktop, VS Code, etc.) **discover and retrieve code by symbol** instead of brute-reading files. Every symbol stores its signature plus a one-line summary, with full source retrievable on demand via O(1) byte-offset seeking.

---

## Proof first: Token savings in the wild

**Repo:** `geekcomputers/Python`
**Size:** 338 files, 1,422 symbols indexed
**Task:** Locate calculator / math implementations

| Approach          | Tokens | What the agent had to do              |
| ----------------- | -----: | ------------------------------------- |
| Raw file approach | ~7,500 | Open multiple files and scan manually |
| jCodeMunch MCP    | ~1,449 | `search_symbols()` → `get_symbol()`   |

### Result: **~80% fewer tokens** (~5× more efficient)

> Cost scales with tokens. Latency scales with how much irrelevant code the model must read.
> jCodeMunch reduces both by turning *search* into *navigation*.

---

## Why agents need this

Agents waste money when they:

* Open entire files to find one function
* Re-read the same code repeatedly
* Consume imports, boilerplate, and unrelated helpers

jCodeMunch gives agents **precision context access**:

* Search symbols by name, kind, or language
* Outline files without loading full contents
* Retrieve only the exact implementation of a symbol
* Fall back to full-text search when symbol lookup misses

Agents do not need larger context windows. They need **structured retrieval**.

---

## How it works

1. **Discovery** — files located via GitHub API or local directory walk
2. **Security filtering** — path traversal, secrets, binary detection, `.gitignore`
3. **Parsing** — tree-sitter AST extraction across supported languages
4. **Storage** — JSON index + raw files stored locally (`~/.code-index/`)
5. **Retrieval** — O(1) byte-offset seeking via stable symbol IDs

### Stable Symbol IDs

```
{file_path}::{qualified_name}#{kind}
```

Examples:

* `src/main.py::UserService.login#method`
* `src/utils.py::authenticate#function`

IDs remain stable across re-indexing when path, qualified name, and kind are unchanged.

---

## Installation

### Prerequisites

* Python 3.10+
* pip (or equivalent)

### Install

```bash
pip install git+https://github.com/jgravelle/jcodemunch-mcp.git
```

Verify installation:

```bash
jcodemunch-mcp --help
```

---

## Configure MCP Client

### Claude Desktop / Claude Code

macOS / Linux
`~/.config/claude/claude_desktop_config.json`

Windows
`%APPDATA%\Claude\claude_desktop_config.json`

```json
{
  "mcpServers": {
    "jcodemunch": {
      "command": "jcodemunch-mcp",
      "env": {
        "GITHUB_TOKEN": "ghp_...",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}
```

Environment variables are optional:

| Variable            | Purpose                                                              |
| ------------------- | -------------------------------------------------------------------- |
| `GITHUB_TOKEN`      | Higher GitHub API limits and private repository access               |
| `ANTHROPIC_API_KEY` | AI-generated symbol summaries (otherwise docstrings/signatures used) |

---

## Usage Examples

```
index_folder: { "path": "/path/to/project" }
index_repo:   { "url": "owner/repo" }

get_repo_outline: { "repo": "owner/repo" }
get_file_outline: { "repo": "owner/repo", "file_path": "src/main.py" }
search_symbols:   { "repo": "owner/repo", "query": "authenticate" }
get_symbol:       { "repo": "owner/repo", "symbol_id": "src/main.py::MyClass.login#method" }
search_text:      { "repo": "owner/repo", "query": "TODO" }
```

---

## Tools (11)

| Tool               | Purpose                     |
| ------------------ | --------------------------- |
| `index_repo`       | Index a GitHub repository   |
| `index_folder`     | Index a local folder        |
| `list_repos`       | List indexed repositories   |
| `get_file_tree`    | Repository file structure   |
| `get_file_outline` | Symbol hierarchy for a file |
| `get_symbol`       | Retrieve full symbol source |
| `get_symbols`      | Batch retrieve symbols      |
| `search_symbols`   | Search symbols with filters |
| `search_text`      | Full-text search            |
| `get_repo_outline` | High-level repo overview    |
| `invalidate_cache` | Remove cached index         |

All tool responses include a `_meta` envelope with timing and metadata.

---

## Supported Languages

| Language   | Extensions    | Symbol Types                            |
| ---------- | ------------- | --------------------------------------- |
| Python     | `.py`         | function, class, method, constant, type |
| JavaScript | `.js`, `.jsx` | function, class, method, constant       |
| TypeScript | `.ts`, `.tsx` | function, class, method, constant, type |
| Go         | `.go`         | function, method, type, constant        |
| Rust       | `.rs`         | function, type, impl, constant          |
| Java       | `.java`       | method, class, type, constant           |
| PHP        | `.php`        | function, class, method, type, constant |

See **LANGUAGE_SUPPORT.md** for full semantics.

---

## Security

Built-in indexing protections:

* Path traversal prevention
* Symlink escape protection
* Secret file exclusion (`.env`, `*.pem`, etc.)
* Binary detection
* Configurable file size limits

See **SECURITY.md** for details.

---

## Best Use Cases

* Large multi-module repositories
* Agent-driven refactors
* Architecture exploration
* Faster onboarding to unfamiliar codebases
* Token-efficient multi-agent workflows

---

## Not Intended For

* Language-server features (LSP diagnostics or completions)
* Editing workflows
* Real-time file watching
* Cross-repository global indexing
* Semantic program analysis (parsing is syntactic via AST)

---

## Environment Variables

| Variable            | Purpose                   | Required |
| ------------------- | ------------------------- | -------- |
| `GITHUB_TOKEN`      | GitHub API auth           | No       |
| `ANTHROPIC_API_KEY` | Symbol summary generation | No       |
| `CODE_INDEX_PATH`   | Custom cache path         | No       |

---

## Documentation

* USER_GUIDE.md — workflows and examples
* ARCHITECTURE.md — design and data flow
* SPEC.md — tool and algorithm specifications
* SECURITY.md — security policies
* SYMBOL_SPEC.md — symbol schema
* CACHE_SPEC.md — cache format and invalidation
* LANGUAGE_SUPPORT.md — parser details

---

## License

MIT
