Metadata-Version: 2.4
Name: code-knowledge-graph-tool
Version: 0.1.0
Summary: Code Structure with Graph — graph-powered code intelligence
License-Expression: MIT
Requires-Python: >=3.11
Requires-Dist: click>=8.0
Requires-Dist: einops>=0.8.2
Requires-Dist: faiss-cpu>=1.9
Requires-Dist: fastapi>=0.135.2
Requires-Dist: jsonpatch>=1.33
Requires-Dist: kreuzberg>=4.7
Requires-Dist: kuzu>=0.8
Requires-Dist: langchain-community>=0.3
Requires-Dist: langchain-core>=0.3
Requires-Dist: langchain-huggingface>=0.1
Requires-Dist: langchain-mcp-adapters>=0.2.2
Requires-Dist: langchain-openai>=1.1.11
Requires-Dist: langgraph-checkpoint>=4.0.1
Requires-Dist: langgraph>=0.2
Requires-Dist: langmem>=0.0.30
Requires-Dist: leidenalg>=0.10
Requires-Dist: mcp>=1.0
Requires-Dist: numpy>=1.26
Requires-Dist: pathspec>=0.12
Requires-Dist: pydantic>=2.0
Requires-Dist: pytest>=9.0.2
Requires-Dist: python-docx>=1.2.0
Requires-Dist: python-dotenv>=1.2.2
Requires-Dist: python-igraph>=0.11
Requires-Dist: python-multipart>=0.0.22
Requires-Dist: rich>=13.0
Requires-Dist: sentence-transformers>=3.0
Requires-Dist: starlette>=0.36
Requires-Dist: tiktoken>=0.12.0
Requires-Dist: tree-sitter-ada>=0.1.0
Requires-Dist: tree-sitter-c-sharp>=0.23
Requires-Dist: tree-sitter-c>=0.24
Requires-Dist: tree-sitter-commonlisp>=0.4.1
Requires-Dist: tree-sitter-cpp>=0.23
Requires-Dist: tree-sitter-delphi>=0.1.0
Requires-Dist: tree-sitter-fortran>=0.5.1
Requires-Dist: tree-sitter-go>=0.24
Requires-Dist: tree-sitter-java>=0.23
Requires-Dist: tree-sitter-javascript>=0.24
Requires-Dist: tree-sitter-kotlin>=1.0
Requires-Dist: tree-sitter-matlab>=1.3.0
Requires-Dist: tree-sitter-objc>=3.0.2
Requires-Dist: tree-sitter-php>=0.24
Requires-Dist: tree-sitter-python>=0.24
Requires-Dist: tree-sitter-ruby>=0.23
Requires-Dist: tree-sitter-rust>=0.24
Requires-Dist: tree-sitter-scala>=0.24.0
Requires-Dist: tree-sitter-solidity>=1.2.13
Requires-Dist: tree-sitter-swift>=0.0.1
Requires-Dist: tree-sitter-typescript>=0.23
Requires-Dist: tree-sitter>=0.24
Requires-Dist: uvicorn[standard]>=0.42.0
Requires-Dist: watchfiles>=1.1
Provides-Extra: dev
Requires-Dist: httpx>=0.27; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Provides-Extra: gpu
Requires-Dist: torch>=2.0; extra == 'gpu'
Provides-Extra: ocr-gemini
Requires-Dist: google-genai>=1.0; extra == 'ocr-gemini'
Provides-Extra: ocr-paddle
Requires-Dist: kreuzberg[paddle]; extra == 'ocr-paddle'
Description-Content-Type: text/markdown

# CSG - Code Structure with Graph

Graph-powered code intelligence platform that builds a knowledge graph from source code using deterministic AST parsing. Designed to give AI agents deep, structured understanding of codebases through an MCP (Model Context Protocol) server.

No LLMs are used during indexing - the graph is built entirely from tree-sitter AST parsing, language-specific import/call resolution, and community detection algorithms.

## Key Features

- **13 Language Support** - Python, TypeScript, JavaScript, Java, C, C++, C#, Go, Rust, Ruby, PHP, Kotlin, Swift
- **Knowledge Graph** - Symbols, call relationships, inheritance, imports, and execution flows stored in KuzuDB
- **Community Detection** - Leiden algorithm automatically clusters related code into functional areas
- **Execution Flow Tracing** - Traces processes from entry points through call chains
- **MCP Server** - 7 tools + 5 resources for AI agent integration (Claude Desktop, VS Code, etc.)
- **Blast Radius Analysis** - Know what breaks before you change a symbol
- **Hybrid Search** - BM25 full-text + semantic embeddings with reciprocal rank fusion
- **Git-Aware Change Detection** - Maps `git diff` hunks to affected symbols and processes
- **Graph-Aware Rename** - Multi-file coordinated renames using the knowledge graph
- **Web UI** - Upload, analyze, and visualize code graphs in browser

## Architecture

CSG now follows a four-layer clean architecture:

- `csg/domain`: pure business concepts and errors
- `csg/application`: ports and use cases
- `csg/presentation`: CLI, web, and MCP transports
- `csg/infrastructure`: config, DI, logging, graph/search/parsing/agent runtime support

Presentation paths call application use cases only. Runtime engines live under infrastructure-owned `internal/` packages. The legacy `kernel` and `adapters` layers have been removed.

## Setup

### Prerequisites

- **Python 3.11+**
- **Node.js 20+** (for frontend, optional)
- **Git** (for change detection features)

### Install with uv (recommended)

```bash
git clone https://gitlab.ops-ai.dev/fhn-ngt/code-knowledge-graph-tool/code-knowledge-graph.git
cd code-knowledge-graph
uv sync
```

### Install with pip

```bash
git clone https://gitlab.ops-ai.dev/fhn-ngt/code-knowledge-graph-tool/code-knowledge-graph.git
cd code-knowledge-graph
pip install -e .
```

### Install with Docker

```bash
git clone https://gitlab.ops-ai.dev/fhn-ngt/code-knowledge-graph-tool/code-knowledge-graph.git
cd code-knowledge-graph
cp .env.example .env    # Edit as needed
docker-compose up
```

This starts the backend on port `8080` and the frontend on port `3000`.

### Install Without Cloning

CSG can be distributed through the GitLab private PyPI Package Registry. This
lets users install the `csg` CLI without cloning the private source repository.

The package publish job runs in GitLab CI and uploads to:

```text
${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/pypi
```

Ask a project Maintainer to create a Deploy Token with `read_package_registry`,
then install with:

```bash
uv tool install code-knowledge-graph-tool \
  --index-url https://<deploy-token-user>:<deploy-token>@gitlab.example.com/api/v4/projects/<project-id>/packages/pypi/simple
```

If your shell history is shared, configure registry credentials through your
standard credential manager or `.netrc` instead of putting the token directly in
the command.

After installation, run CSG from the target repository:

```bash
cd /path/to/your/repo
csg analyze .
csg setup .
```

### Frontend Setup (optional)

```bash
cd frontend
npm install
npm run dev       # Dev server at http://localhost:5173
npm run build     # Production build
```

### Optional: GPU Support

For faster embedding generation:

```bash
pip install -e ".[gpu]"
```

### Optional: Development Dependencies

```bash
pip install -e ".[dev]"
```

## Quick Start

### 1. Analyze a Repository

```bash
# Analyze the current directory
csg analyze

# Analyze a specific path
csg analyze /path/to/your/repo

# Force re-analysis with embeddings
csg analyze /path/to/repo --force --embeddings
```

### 2. Check Index Status

```bash
# List all indexed repos
csg list

# Show status for current repo
csg status
```

### 3. Install Agent Integration Files

`csg analyze` installs/updates Codex and Claude Code integration files in the
analyzed repository. To do the full setup on a machine that already has `csg`
installed, run:

```bash
csg setup /path/to/your/repo
```

This installs repo-local files and registers the `csg` MCP server with detected
Claude Code and Codex CLIs. It uses:

```bash
claude mcp add --transport stdio --scope user csg -- csg mcp
codex mcp add csg -- csg mcp
```

To install only repo-local agent files without touching global MCP client
configuration:

```bash
csg setup-agents /path/to/your/repo
```

This creates or updates:

- `AGENTS.md` with CSG MCP rules for Codex
- `CLAUDE.md` with CSG MCP rules for Claude Code
- `.codex/skills/csg/*/SKILL.md`
- `.claude/skills/csg/*/SKILL.md`
- `.codex/hooks.json`
- `.claude/settings.json` and `.claude/hooks/csg-hook.py`

Existing `AGENTS.md`, `CLAUDE.md`, `.codex/hooks.json`, and
`.claude/settings.json` content is preserved; CSG writes marker-bounded blocks
or merges hook entries.

### 4. Start the MCP Server

For AI agent integration (Claude Desktop, VS Code, etc.):

```bash
# Stdio transport (default - for Claude Desktop)
csg mcp

# With a specific database path
csg mcp --db-path /path/to/.csg/lbug

# HTTP transport (for remote clients)
csg mcp --transport http --port 8080
```

### 5. Start the Web UI

```bash
csg web
# Open http://127.0.0.1:8080
```

## MCP Integration

### Claude Desktop / Claude Code

Add to your MCP config (`.mcp.json` or Claude Desktop settings):

```json
{
  "mcpServers": {
    "csg": {
      "command": "/path/to/your/.venv/bin/csg",
      "args": ["mcp", "--db-path", "/path/to/.csg/lbug"]
    }
  }
}
```

### Available Tools

| Tool | Description |
|------|-------------|
| `csg_explore` | 6-layer knowledge graph retrieval. Use `layer=` to target: topology, relevance, context, crosscut, contract, implementation. Omit for auto mode. |
| `csg_context` | 360-degree symbol view: callers, callees, processes, signature, heritage, overrides, source code. |
| `csg_impact` | Blast radius analysis. Shows what would break if a symbol changes. |
| `csg_detect_changes` | Maps `git diff` to affected symbols and execution flows. |
| `csg_rename` | Graph-aware multi-file rename with dry-run preview. |
| `csg_analyze` | Index a new repository (accepts directory or .zip). |
| `csg_list_repos` | List all indexed repositories. |

### Available Resources

| URI | Description |
|-----|-------------|
| `csg://repo/{name}/context` | Repository overview and stats |
| `csg://repo/{name}/clusters` | All communities with members and cohesion |
| `csg://repo/{name}/processes` | All execution flows |
| `csg://repo/{name}/process/{name}` | Step-by-step trace of a process |
| `csg://repo/{name}/schema` | Graph schema and example Cypher queries |

### Recommended Workflow

```
1. csg_explore(layer="topology")                    Project overview — communities, sizes
2. csg_explore(layer="relevance", query="...")       Search the codebase
3. csg_explore(layer="context", scope="community:X") Execution flows + symbols in an area
4. csg_explore(layer="contract", query="sym1,sym2")  Signatures, callers, callees
5. csg_explore(layer="implementation", query="sym")  Source code
6. csg_impact(target="...")                          Check blast radius before changes
7. csg_detect_changes()                              Pre-commit review
```

## CLI Reference

```
csg analyze [PATH] [--force] [--embeddings]
    Analyze a repository and build the knowledge graph.
    PATH defaults to current directory.
    --force       Re-analyze even if an index exists.
    --embeddings  Generate semantic embeddings (slow for large repos).

csg mcp [--db-path PATH] [--transport stdio|http] [--port PORT]
    Start the MCP server.
    --transport   stdio (default) or http.
    --port        HTTP port (default: 8080).

csg list
    List all indexed repositories.

csg status
    Show index status for the current repository.

csg web [--host HOST] [--port PORT]
    Start the web UI.
    --host  Bind address (default: 127.0.0.1).
    --port  Bind port (default: 8080).

csg clean
    Delete the index for the current repository.

csg watch [--debounce SECONDS] [--interval SECONDS]
    Watch indexed repositories for file changes and run incremental re-ingestion.
    --debounce  Wait for filesystem events to settle before re-ingesting.
    --interval  Refresh the indexed repository list periodically.
```

## Supported Languages

| Language | Extensions |
|----------|-----------|
| Python | `.py`, `.pyi` |
| TypeScript | `.ts`, `.tsx`, `.mts`, `.cts` |
| JavaScript | `.js`, `.jsx`, `.mjs`, `.cjs` |
| Java | `.java` |
| C | `.c`, `.h` |
| C++ | `.cpp`, `.cxx`, `.cc`, `.hpp`, `.hxx`, `.hh` |
| C# | `.cs` |
| Go | `.go` |
| Rust | `.rs` |
| Ruby | `.rb` |
| PHP | `.php` |
| Kotlin | `.kt`, `.kts` |
| Swift | `.swift` |

## Testing

```bash
# Run unit tests only (~2s)
uv run python -m pytest -m unit

# Run integration tests (~90s)
uv run python -m pytest -m integration

# Run everything except slow tests
uv run python -m pytest -m "not slow"

# Full suite with coverage report
uv run python -m pytest --cov=csg --cov-report=term-missing
```

Test markers: `unit`, `integration`, `slow`, `requires_db`.

## CI/CD

GitLab CI pipeline runs on every push:

| Stage | Job | Description |
|-------|-----|-------------|
| lint | `lint` | ruff check + format |
| test | `test:unit` | Unit tests |
| test | `test:integration` | Integration tests (excludes slow) |
| test | `test:coverage` | Full coverage report (Cobertura) |
| build | `build:docker` | Docker images (main/tags only) |
| secret-detection | `secret_detection` | GitLab secret scanning |

## Environment Variables

Copy `.env.example` to `.env` and configure as needed:

```bash
# LLM (required for agent and SRS generation)
OPENAI_API_KEY=sk-your-openai-api-key

# Data directory (optional)
# CSG_DATA_DIR=/path/to/data

# Embedding configuration (optional - defaults to OpenAI with local fallback)
# CSG_EMBEDDING_PROVIDER=openai
# CSG_EMBEDDING_MODEL=text-embedding-3-small

# Remote embedding (optional - any OpenAI-compatible API)
# CSG_EMBEDDING_PROVIDER=openai
# CSG_EMBEDDING_URL=http://your-server:8080/v1
# CSG_EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
# CSG_EMBEDDING_API_KEY=your-key
```

## Project Structure

```
csg/
  domain/
  application/
    ports/
    use_cases/
  adapters/
    inbound/
      cli/
      web/
      mcp/
    outbound/
      analysis/
      agent/
      embeddings/
      graph/
      parsing/
      repository/
      search/
      sources/
  infrastructure/
    config/
    di/
    logging/
    runtime/
tests/
  conftest.py             Shared fixtures
  unit/                   Pure-function unit tests
  integration/            DB and filesystem tests
frontend/
  src/                    React + Vite + TypeScript
  Dockerfile              Frontend container
docker-compose.yml        Backend + frontend services
pyproject.toml            Dependencies and metadata
CODEOWNERS                Code ownership
```

## Tech Stack

| Component | Technology |
|-----------|-----------|
| Graph Database | KuzuDB (embedded, on-disk) |
| AST Parsing | tree-sitter |
| Pipeline Orchestration | LangGraph |
| Community Detection | python-igraph + leidenalg |
| Embeddings | sentence-transformers + FAISS |
| CLI | Click + Rich |
| Web Backend | FastAPI + uvicorn |
| MCP Protocol | mcp >= 1.0 |
| Frontend | React 18 + Vite + Tailwind CSS |
| Graph Visualization | vis-network |
| Package Manager | uv (Python), npm (Node) |

## Contributing

See `CODEOWNERS` for code ownership. Run `uv run python -m pytest -m "not slow"` before submitting PRs.

## License

MIT
