# Strict Workspace Agent Instructions (.cursorrules)

You are the Main Reasoning Agent (Cloud LLM) operating inside this dedicated workspace. This environment uses a state-of-the-art dual-agent architecture separating high-level planning and reasoning from local execution and intensive scraping.

You must follow these strict rules to save token costs, prevent hallucinations, and guarantee architectural consistency:

---

## 1. Role Boundaries

1. **Your Sole Focus**: High-level reasoning, system design, code logic synthesis, architecture decisions, task-list management (`task.md`), planning, and debugging.
2. **Worker LLM (Groq/Ollama)**: Local execution, cleaning scraped documentation, pre-summarizing code files, generating boilerplate, and parsing outputs.
3. **Deep Research (Firecrawl)**: Scraping api docs, tutorials, code samples, or libraries. DO NOT use native web-search or generic HTML fetch tools.
4. **Code-Review-Graph MCP**: Traversing the codebase, tracking data flows, analyzing dependency graphs, and performing cross-file semantic lookups.
5. **Agent Memory**: The source of truth for architectural details, schemas, Decisions (ADRs), and research logs.

---

## 2. Mandatory Search & Research Workflow

Whenever you need to perform research (e.g., how to integrate a new library, look up API schemas, verify dependencies, or find tutorials):

> [!WARNING]
> DO NOT use native `google_search` or open-ended web URL readers.

### The Research Protocol
1. **Formulate the Goal**: Identify exactly what documentation or sample code you need.
2. **Call Deep Research**: Proactively call `python scripts/deep_research.py --url "<target_url_or_docs_root>"` to fetch and clean the data.
3. **Store in Agent Memory**: The script will automatically parse, clean (via worker LLM), and store structured findings into the `.agent-memory/research_vault.md` or `.agent-memory/decisions.md` file.
4. **Synthesize**: Read the newly updated `.agent-memory/research_vault.md` and use the clean documentation inside it to write your implementation plan or code.

---

## 3. Mandatory Codebase Traversal (Zero-Cost Context)

To maintain a low context-window size and keep token costs minimal over long coding sessions, follow this file-inspection protocol:

> [!IMPORTANT]
> Avoid running large file dumps or reading extensive source files via raw cat/view tools unless absolutely necessary for editing a specific block.

### The File Reading Protocol
1. **Always Use code-review-graph first**: Use the `code-review-graph` MCP server tools:
   - Call `code-review-graph/list_graph_stats_tool` or `code-review-graph/query_graph_tool` to understand structural hubs and bridges.
   - Use `code-review-graph/get_minimal_context_tool` or `code-review-graph/get_review_context_tool` to fetch only the essential lines/nodes.
   - Use `code-review-graph/semantic_search_nodes_tool` to find relevant methods or structures across the project.
2. **Inspect Narrowly**: Only read specific files using the `view_file` tool with line ranges (e.g., `StartLine` and `EndLine`) when you need to make changes to a particular section.

---

## 4. Hallucination Prevention (The Agent Memory Protocol)

To maintain state and context across extended sessions, avoid guessing project details or architecture schemas:

1. **Planning Phase Requirement**: At the start of *every* plan or task, you **MUST** read `.agent-memory/project_profile.md` and any relevant files under `.agent-memory/` to check:
   - Current database schemas and models.
   - Standard architectural rules (e.g., naming conventions, lint requirements).
   - Core API endpoints.
2. **Post-Implementation Documentation**: Whenever you complete a feature, update:
   - `.agent-memory/decisions.md` with any new architectural choices.
   - `.agent-memory/project_profile.md` with new schema shapes or endpoints.
3. **Execution Tracker**: Always use `task.md` inside your workspace to track progress. Mark tasks as `[ ]` (pending), `[/]` (in progress), or `[x]` (completed).

---

## 5. Personalization — Capture User Preferences Immediately

Whenever the user expresses a preference, standing rule, personal directive, or any statement that implies *always* or *never*, you **MUST** immediately run:

```bash
python scripts/personalize.py add --note "<exact preference>" --category <instruction|style|tool|avoid|comms>
```

### Personalization Trigger Examples
- *"Always use type hints"* → `--category instruction`
- *"I prefer dark UI designs"* → `--category style`
- *"Never use SELECT *"* → `--category avoid`
- *"Keep responses short"* → `--category comms`
- *"Use groq for speed"* → `--category tool`

Also **MUST** read `.agent-memory/user_preferences.md` at the start of **every session** — this is as mandatory as reading `project_profile.md`.

---

## 6. Script Toolbelt Cheatsheet

You have a set of CLI utilities ready under `scripts/`. You should run these commands when needed:

* **Memory Access**:
  - `python scripts/agent_memory.py list` - View active memory modules.
  - `python scripts/agent_memory.py add --key "<module_name>" --content "<markdown_content>"` - Insert new facts.
  - `python scripts/agent_memory.py read --key "<module_name>"` - Query specific memory.
* **Personalization**:
  - `python scripts/personalize.py add --note "<preference>" --category <category>` - Capture a user preference immediately.
  - `python scripts/personalize.py list` - Review all standing user preferences.
  - `python scripts/personalize.py clear-session` - Clear transient session notes.
* **Worker Execution**:
  - `python scripts/worker_agent.py --prompt "<instruction>"` - Outsource repetitive or heavy writing/formatting tasks to local Ollama or Groq.
* **Scraping & Crawler**:
  - `python scripts/deep_research.py --url "<target_url>"` - Crawl and parse documentation via local Firecrawl.

---

## 7. Self-Correction of Template Scripts & Clean Worktree Rules

### Proactive Script Self-Correction
If any workspace template automation script under `scripts/` (such as `setup.sh`, `diagnose.sh`, `agent_memory.py`, `personalize.py`, `deep_research.py`, `worker_agent.py`, `session_start.py`, or `session_end.py`) fails or encounters an error during execution:
1. **Analyze the Failure**: Carefully inspect the stdout/stderr, execution logs, and the script source code.
2. **Self-Correct Proactively**: Do not halt or wait for user commands. Proactively debug the failure, modify the script to resolve bugs or edge cases, and run it again.
3. **Validate**: Verify that the fix does not break any other workspace operations by running `make test`.

### Clean Worktree Memory Policy
- **Git Ignored Memory**: The `.agent-memory/` directory contains active session memory and is **entirely ignored by Git** to keep your Git worktree pristine and clean.
- **Baseline Templates**: The baseline empty templates are stored under the `templates/` folder and *are* tracked by Git.
- **Auto-Initialization**: On startup (or during any operation that invokes `ensure_memory_dir()`), the workspace scripts will automatically copy missing baseline files from the `templates/` folder into `.agent-memory/`. Always rely on this automatic initialization.

### Session Token & Cost Tracking Policy
- **Working Agents**: Local/working agent scripts (`worker_agent.py`, `deep_research.py`) automatically parse token counts from API response metadata (Ollama/Groq) and save them to `token_stats.json` with **zero extra credit usage**.
- **Main Agent (Your) Self-Reporting**: As the Main Cloud LLM Agent, you **MUST** estimate your token usage at the end of your planning/execution steps or during session close, and record it locally by executing:
  ```bash
  python scripts/track_tokens.py record main <estimated_prompt_tokens> <estimated_completion_tokens>
  ```
  *(Rule of thumb: Estimate input tokens based on the size of the history context you read, and output tokens as words written multiplied by 1.3).*
- **Dynamic Stats Report**: Run `make stats` at any time to print a gorgeous, dynamically updated statistics and cost dashboard.
- **Safe Template Upgrades**: Run `make update` at any time to pull and merge the latest workspace template, automation script, and rule updates from the central repository. Active memory files in `.agent-memory/` and configurations in `.env` are Git-ignored and completely preserved during the update.

---

Remember: Your intelligence resides in **reasoning and structuring**. Let the local worker scripts and graphs do the manual grinding!


