Metadata-Version: 2.4
Name: source-kb
Version: 0.2.6
Summary: Auto-generate structured knowledge base documents from source code. Supports AI agent mode (skill-based) and standalone CLI.
License-Expression: MIT
Keywords: knowledge-base,documentation,code-analysis,llm,rag
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Documentation
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: filelock>=3.12.0
Provides-Extra: skeleton
Requires-Dist: tree-sitter<0.22.0,>=0.21.0; extra == "skeleton"
Requires-Dist: tree-sitter-languages>=1.10.0; extra == "skeleton"
Provides-Extra: rag
Requires-Dist: chromadb>=0.4.0; extra == "rag"
Provides-Extra: full
Requires-Dist: chromadb>=0.4.0; extra == "full"
Requires-Dist: tree-sitter<0.22.0,>=0.21.0; extra == "full"
Requires-Dist: tree-sitter-languages>=1.10.0; extra == "full"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Dynamic: license-file

# source-kb

English | [中文](README.md)

Auto-generate structured knowledge base documents from source code, build vector indexes, and support RAG retrieval.

## Features

- **CLI toolchain**: skeleton extraction, prompt rendering, intelligent splitting, quality validation, vector indexing — all standalone Python scripts
- **Platform-agnostic**: works in Kiro / Cursor / Claude Code / Windsurf / any AI Agent
- **LLM-agnostic**: without an Agent, use engine with any OpenAI-compatible API (Anthropic / OpenAI / DeepSeek / Ollama / vLLM, etc.)
- **Intelligent subagent splitting**: semantic grouping by business domain (LLM-based) or package-aware greedy algorithm (code-based)
- **Two-phase generation**: outline first, then expand — eliminates information isolation between shards
- **Method-level source injection**: prioritized by complexity (high=full body / medium=first 20 lines / low=signature only)
- **Real-time quality validation**: each subagent output is verified immediately, with automatic retry on failure
- **Document deduplication**: LLM-powered post-merge dedup (internal redundancy + cross-doc ownership)
- **Module doc distillation**: summarize multiple microservice docs into domain-level documentation
- Any language support via preset system (java-spring built-in)
- Source code access through git remote (multi-repo & monorepo)
- Lightweight self-built RAG engine (ChromaDB + pluggable embedding)
- Auto-publish docs to source repo knowledge branch

## Quick Start

### Option 1: Skill Mode (recommended)

Install the CLI toolchain:

```bash
pip install source-kb[full]
```

Clone the presets repo into your project:

```bash
git clone https://github.com/anthropics/source-kb-presets.git
cp -r source-kb-presets/skills ./skills
cp -r source-kb-presets/presets ./presets
cp -r source-kb-presets/examples ./examples
```

Then talk to your AI Agent (Kiro / Claude Code / Cursor / Windsurf):

```
Please read skills/kb-init/SKILL.md and follow the steps to initialize the knowledge base.
```

Or add the guide to your project rules file (`.kiro/steering/`, `.cursorrules`, `.claude/CLAUDE.md`, etc.). See [Getting Started](docs/getting-started.md).

**Non-Java project?** Works the same way. Configure `kb-project.yaml` with the `generic` preset:

```bash
# Use generic preset (works for any language)
source-kb extract --repo .source-cache/my-app --preset generic --output knowledge/my-app
```

> The generic preset uses path-based rules + regex parser for file classification — no tree-sitter required. For higher accuracy, create a custom preset following `presets/java-spring/` as reference.

### Option 2: CLI Mode (no Agent, requires LLM API config)

```bash
pip install source-kb[full]

export LLM_BASE_URL="https://api.anthropic.com"
export LLM_MODEL="claude-sonnet-4-6"
export LLM_API_KEY="sk-xxx"
source-kb extract --repo .source-cache/my-app --preset java-spring --summary --output knowledge/my-app
source-kb index --kb my-app
```

> Full guide: [Getting Started](docs/getting-started.md).

## CLI Quick Reference

```bash
# Skeleton extraction
source-kb extract --repo .source-cache/xxx --preset java-spring --summary --output knowledge/xxx

# Prompt rendering
source-kb render --template presets/java-spring/templates/subagent-business.md --module xxx --kb yyy --doc-type business-logic --mode readwrite

# File list extraction
source-kb file-list --skeleton knowledge/xxx/.meta/skeleton/skeleton.json --preset java-spring --doc-type business-logic --output knowledge/xxx/.meta/file-lists/business-logic.txt

# Coverage validation
source-kb validate coverage check --skeleton knowledge/xxx/.meta/skeleton/skeleton.json --docs-dir knowledge/xxx --type service

# Index & search
source-kb index --kb my-project
source-kb search --kb my-project "query"
```

## Skills

| Skill | Purpose |
|-------|---------|
| kb-init | Generate all docs from source + build index |
| kb-audit | Compare docs vs source, fix inconsistencies |
| kb-sync | Detect git changes, incrementally update docs and index |
| kb-search | Vector retrieval + contextual answers |

Skill files are Agent operation guides — no platform-specific instructions.

## Requirements

- Python 3.10~3.12 (⚠️ 3.13 not supported — tree-sitter-languages has no prebuilt wheel)
- Git
- Embedding backend (Ollama / OpenAI-compatible / DashScope / ChromaDB built-in)

## Installation

```bash
# Basic install (CLI toolchain)
pip install source-kb

# Full install (skeleton parsing + vector index)
pip install source-kb[full]

# Skeleton parsing only (tree-sitter)
pip install source-kb[skeleton]

# Vector index only (ChromaDB)
pip install source-kb[rag]
```

Verify installation:

```bash
source-kb --version
source-kb --help
```

## Documentation

- [Getting Started](docs/getting-started.md) — Installation, configuration, first run
- [Configuration Reference](docs/configuration.md) — Full kb-project.yaml field reference
- [Custom Presets](docs/custom-presets.md) — Create and customize language presets
- [Design Document](docs/design-v5.md) — Architecture design and technical decisions
- [CLI vs Agent Mode](docs/cli-vs-agent-mode-analysis.md) — Two modes compared
