Metadata-Version: 2.4
Name: glyphh-code
Version: 0.6.6
Summary: Codebase intelligence for Claude Code — HDC-powered file search
Author-email: Glyphh AI <support@glyphh.ai>
License: AGPL-3.0
Project-URL: Homepage, https://glyphh.ai
Project-URL: Repository, https://github.com/glyphh-ai/glyphh-code
Project-URL: Documentation, https://docs.glyphh.ai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Affero General Public License v3
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: glyphh[runtime]>=1.3.9
Requires-Dist: requests>=2.28.0
Requires-Dist: tree-sitter>=0.22.0
Requires-Dist: tree-sitter-python>=0.22.0
Requires-Dist: tree-sitter-javascript>=0.22.0
Requires-Dist: tree-sitter-typescript>=0.22.0
Requires-Dist: tree-sitter-go>=0.22.0
Requires-Dist: tree-sitter-rust>=0.22.0
Requires-Dist: tree-sitter-java>=0.22.0
Requires-Dist: tree-sitter-c>=0.22.0
Requires-Dist: tree-sitter-cpp>=0.22.0
Provides-Extra: all-grammars
Requires-Dist: tree-sitter-ruby>=0.22.0; extra == "all-grammars"
Requires-Dist: tree-sitter-c-sharp>=0.22.0; extra == "all-grammars"
Requires-Dist: tree-sitter-swift>=0.22.0; extra == "all-grammars"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Dynamic: license-file

# Glyphh Code

File-level codebase intelligence for Claude Code. Encodes every source file in
your repo as an HDC vector. Claude Code queries the index instead of scanning
files.

Same architecture as glyphh-pipedream (3,146 apps) and glyphh-bfcl (#1 on
BFCL V4). No LLM at build time. No LLM at search time. Pure HDC encoding and
cosine search.

Built on [**Glyphh Ada 1.1**](https://www.glyphh.ai/products/runtime) · **[Docs →](https://glyphh.ai/docs)** · **[Glyphh Hub →](https://glyphh.ai/hub)** · **License: [AGPL-3.0](LICENSE)**

---

> **WORK IN PROGRESS** — This model is under active development. Benchmarks
> show Glyphh uses **20% fewer tokens and 22% fewer turns** than bare Claude
> Code, with equal search accuracy (13/15). Overall accuracy is 76% vs 84%
> due to MCP startup latency causing timeouts — not an HDC issue. See
> [benchmark/BENCHMARK.md](benchmark/BENCHMARK.md) for full results and
> analysis.

## Quick Start

One install, one command. No Docker, no PostgreSQL, no auth required.

```bash
pip install glyphh-code
```

This installs the Glyphh runtime, CLI, and the Code model as a single package.

Then from your project root:

```bash
glyphh            # enter the Glyphh shell
code init .       # deploy model, compile codebase, configure Claude Code
```

That's it. `code init` handles everything:

1. **Starts a local dev server** (SQLite, no Docker needed)
2. **Deploys the Code model** to the running runtime
3. **Compiles your codebase** into an HDC vector index
4. **Configures Claude Code** — adds MCP server, CLAUDE.md, hooks, permissions

Restart Claude Code to activate. In VS Code: `Cmd+Shift+P` → "Claude Code:
Restart". In the CLI: exit and re-enter the session.

Verify the connection with `/mcp` — you should see `glyphh_search`,
`glyphh_related`, and `glyphh_stats` listed as available tools.


## Using PostgreSQL + pgvector (optional)

SQLite works out of the box for local development. For larger codebases or
production use, you can use PostgreSQL with pgvector for faster similarity
search via HNSW indexing.

```bash
# Set DATABASE_URL before starting the shell
export DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/glyphh

glyphh
code init .
```

Or use the built-in Docker setup:

```bash
glyphh
docker init       # generates docker-compose.yml + init.sql
exit

docker compose up -d --wait
glyphh
code init .
```

The runtime auto-detects the backend from `DATABASE_URL` — no configuration
changes needed. SQLite uses Python cosine similarity; PostgreSQL uses native
pgvector `<=>` with HNSW indexing.


## Shell Commands

After `pip install glyphh-code`, the `code` command is available inside the
Glyphh shell:

| Command | Description |
|---------|-------------|
| `code init [path]` | Full setup: start server, deploy, compile, configure Claude Code |
| `code compile [path]` | Recompile the index (full or incremental) |
| `code status` | Show current status (server, files indexed, MCP URL) |
| `code stop` | Stop the dev server |

The shell also has all standard runtime commands (`dev`, `model`, `auth`,
`config`, etc.). Type `help` for the full list.


---

## What it does

Compiles your codebase into a vector index. Exposes it to Claude Code via MCP.

**Without Glyphh:**
Claude reads project structure, scans likely files, reads module, reads tests.
~6,000 tokens before first useful output.

**With Glyphh:**
Claude calls `glyphh_search("auth token validation")`.
Returns: file path, confidence, top concepts, imports, related files.
Claude reads one file and acts.
~400 tokens before first useful output.

The index stores not just the vector but the token vocabulary of every file.
Search results return enough context that Claude often does not need to read
the file at all. When it does read, it already knows exactly what to look for.


## Architecture

Same paradigm as all Glyphh models. The file is the exemplar.

```
Build time:  read file → tokenize path + identifiers + imports
             → encode into HDC vector → store vector + metadata
             → supports pgvector (HNSW) or SQLite (Python cosine)

Runtime:     NL query → encode with same pipeline
             → cosine search against index
             → return file path + top tokens + imports
             → Claude reads one file, acts
```

No LLM at build time. No LLM at runtime for search.


## Encoder

Three-layer HDC encoder at 2,000 dimensions:

| Layer | Weight | Signal |
|-------|--------|--------|
| **path** | 0.30 | File path tokens (BoW): `src/services/user_service.py` → `src services user service py` |
| **symbols** | 0.50 | AST-extracted definitions (class/function names via tree-sitter) |
| **content** | 0.20 | Identifiers (1.0) + imports (0.8) as BoW |

The symbols layer encodes what a file **defines**, not what it uses. This
naturally separates source files from their tests — `auth.py` defines
`AuthMiddleware` while `test_auth.py` defines `test_check_auth`.

Metadata stored per file (not encoded, returned at search time):
- `top_tokens`: 20 most frequent meaningful tokens
- `imports`: list of imported module/package names
- `extension`: file type
- `file_size`: bytes


## MCP Tools

Exposed through the runtime's model-specific MCP tool system:

### glyphh_search

Find files by natural language query. Returns ranked matches with confidence
scores, top tokens, and import lists.

```json
{"tool": "glyphh_search", "arguments": {"query": "auth token validation", "top_k": 5}}
```

Confidence gate: below threshold returns ASK with candidates, never silent
wrong routing.

### glyphh_related

Find files semantically related to a given file. Use before editing to
understand blast radius.

```json
{"tool": "glyphh_related", "arguments": {"file_path": "src/services/auth.py", "top_k": 5}}
```

### glyphh_stats

Index statistics: total files, extension breakdown.


## Incremental compile

The index is updated automatically after every `git commit` via the Claude Code
PostToolUse hook configured by `code init`.

For manual recompilation:

```bash
glyphh
code compile .                    # full recompile
code compile /path/to/repo        # compile a different repo
```


## File support

Indexes: `.py`, `.ts`, `.tsx`, `.js`, `.jsx`, `.java`, `.cpp`, `.c`, `.h`,
`.go`, `.rs`, `.rb`, `.cs`, `.swift`, `.sql`, `.graphql`, `.yaml`, `.json`,
`.sh`, `.css`, `.html`, `.svelte`, `.vue`, `.md`, `.proto`, `.tf`, and more.

Skips: `.git`, `node_modules`, `__pycache__`, `dist`, `build`, `vendor`,
`target`, and other build/cache directories.

Max file size: 500 KB. Binary files auto-skipped.


## What `code init` configures

Running `code init .` in the Glyphh shell sets up the following in your
project:

- **MCP server** — `claude mcp add --transport http glyphh <url>`
- **CLAUDE.md** — instructs Claude to use `glyphh_search` before reading files
- **`.claude/settings.json`** — hooks and permissions:
  - `mcp__glyphh__*` permission (no MCP prompts)
  - PreToolUse hook: blocks Grep/Glob, redirects to `glyphh_search`
  - PostToolUse hook: runs incremental compile after `git commit`
- **`.gitignore`** — adds `.glyphh/` entry


## Environment variables

| Variable | Default | Description |
|----------|---------|-------------|
| `DATABASE_URL` | SQLite (`~/.glyphh/local.db`) | Database connection string |
| `GLYPHH_RUNTIME_URL` | `http://localhost:8002` | Runtime endpoint |
| `GLYPHH_TOKEN` | Auto-resolved from CLI session | Auth token |
| `GLYPHH_ORG_ID` | Auto-resolved from CLI session | Org ID |
| `GLYPHH_HOOK_DISABLE` | — | Set to `1` to temporarily disable hooks |


## Tests

```bash
pip install glyphh-code[dev]
pytest tests/ -v
```
