Metadata-Version: 2.4
Name: karpathy-review
Version: 0.1.6
Summary: Code review in Karpathy's voice using RAG over his repos. Not affiliated with Andrej Karpathy.
Requires-Python: >=3.11
Requires-Dist: anthropic>=0.40.0
Requires-Dist: chromadb>=1.5.5
Requires-Dist: fastapi>=0.135.1
Requires-Dist: httpx>=0.28.1
Requires-Dist: huggingface-hub>=0.23.0
Requires-Dist: inngest>=0.5.18
Requires-Dist: openai>=2.26.0
Requires-Dist: python-dotenv>=1.2.2
Requires-Dist: rich>=13.0.0
Requires-Dist: tiktoken>=0.12.0
Requires-Dist: uvicorn>=0.41.0
Description-Content-Type: text/markdown

# karpathy-review

not affiliated with andrej karpathy. built by a fan using his public repos.

A code reviewer that uses RAG over Karpathy's actual repos to review your code in his voice. It retrieves the closest matching examples from his source code and refactoring commits, then uses those as grounding for the review — not a persona, not a prompt description, his actual words and decisions.

## install

```
pip install karpathy-review
```

## usage

```
karpathy-review train.py
karpathy-review src/
karpathy-review model.py --rewrite
```

Point it at a `.py` or `.c` file, or a directory. It walks recursively. `--rewrite` offers to rewrite flagged functions one by one with a diff and apply/skip prompt.

On first run it asks for an API key (OpenAI or Anthropic) and downloads the knowledge base (~50mb) from HuggingFace. Both are saved locally and not needed again.

## how it works

The knowledge base is built by scraping Karpathy's repos — nanoGPT, micrograd, minBPE, llm.c, ng-video-lecture — via the GitHub API, chunking by function using AST (Python) and regex (C), then embedding with `text-embedding-3-small` into ChromaDB. Commits are filtered to opinionated ones ("simplify", "remove unnecessary", "just use X") to extract his actual criticism voice. The KB is hosted on HuggingFace and auto-downloads on first run.

Review is two-tier:

- **Tier 1** — KB has relevant examples. The review is grounded in his actual code, with citations to repo and file. ML code almost always hits this tier.
- **Tier 2** — KB has nothing relevant. Falls back to universal principles applied in his voice: functions too long, magic numbers, thin wrappers, unnecessary classes.

Each file gets a one-line opener ("few things in the forward pass") followed by line-level comments. Chunk reviews run in parallel.

## requirements

OpenAI API key (always required — used for embeddings regardless of review provider).

Optionally Anthropic API key if you prefer Claude for the review. Both are prompted on first run and saved to `~/.karpathy-review/config`.

## example

```
 andrej is reviewing your code...

──────────────────── model.py ────────────────────
  ok so the attention is not quite right

  L42   uses nn.MultiheadAttention instead of custom causal self-attention
        — nanoGPT/model.py
  L51   missing causal mask
        — nanoGPT/model.py
  L67   forward applies layernorm after residual, should be before

 want andrej to take over? [y/n]:
```

## stack

Python, ChromaDB, OpenAI embeddings, HuggingFace Hub, Inngest (scraping pipeline), Rich (terminal output)

---

built from andrej karpathy's public repositories (nanoGPT, micrograd, minBPE, llm.c, ng-video-lecture).  
copyright (c) andrej karpathy — MIT license.  
not affiliated with or endorsed by andrej karpathy.
