Metadata-Version: 2.4
Name: caesar-agent
Version: 0.3.12
Summary: Caesar: autonomous AI research agent with graph-based deep web exploration and adversarial answer synthesis.
Author: Elliot Meyerson, Risto Miikkulainen
Author-email: Jason Liang <jason.liang@cognizant.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/cognizant-ai-lab/caesar-agent
Project-URL: Repository, https://github.com/cognizant-ai-lab/caesar-agent
Project-URL: Issues, https://github.com/cognizant-ai-lab/caesar-agent/issues
Project-URL: Paper, https://arxiv.org/abs/2604.20855
Project-URL: Changelog, https://github.com/cognizant-ai-lab/caesar-agent/releases
Keywords: ai-agents,llm,rag,agentic-rag,agentic-ai,autonomous-agents,deep-research,research-agent,knowledge-graph,retrieval-augmented-generation,scientific-discovery,web-exploration,generator-verifier,llm-agents,openai,anthropic,chromadb,llama-index,ai-research
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Requires-Python: <3.14,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: colorama
Requires-Dist: litellm
Requires-Dist: networkx
Requires-Dist: numpy
Requires-Dist: openai
Requires-Dist: portalocker
Requires-Dist: psutil
Requires-Dist: pytest
Requires-Dist: scipy
Requires-Dist: tinydb
Requires-Dist: chromadb==1.5.2
Requires-Dist: llama-index
Requires-Dist: llama-index-vector-stores-chroma
Requires-Dist: llama-index-embeddings-openai
Requires-Dist: llama-index-llms-openai
Requires-Dist: mem0ai
Requires-Dist: anthropic
Requires-Dist: beautifulsoup4
Requires-Dist: curl_cffi
Requires-Dist: ddgs
Requires-Dist: google-genai
Requires-Dist: pygraphviz
Requires-Dist: pypdf
Requires-Dist: tabulate
Requires-Dist: tenacity
Requires-Dist: trafilatura
Requires-Dist: evalplus
Requires-Dist: graphviz
Requires-Dist: llmlingua
Requires-Dist: tiktoken
Requires-Dist: xxhash
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: mypy>=1.7.0; extra == "dev"
Requires-Dist: ipython>=8.0.0; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=7.4.0; extra == "test"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "test"
Requires-Dist: pytest-xdist>=3.5.0; extra == "test"
Dynamic: license-file

<p align="center">
  <img src="https://jasonzliang.github.io/caesar-agent/caesar.webp" alt="Caesar autonomous research agent architecture: Perceive-Think-Act exploration loop and Generator-Verifier adversarial synthesis" width="720"/>
</p>

<h1 align="center">Caesar: Autonomous AI Research Agent</h1>

<p align="center">
  <strong>Deep web exploration and creative answer synthesis. The open-source alternative to ChatGPT Deep Research and Perplexity.</strong>
</p>

<p align="center">
  <a href="#quickstart"><img alt="Python 3.10+" src="https://img.shields.io/badge/python-3.10%2B-blue"></a>
  <a href="https://arxiv.org/abs/2604.20855"><img alt="Paper" src="https://img.shields.io/badge/paper-arXiv%202604.20855-b31b1b"></a>
  <a href="https://www.researchgate.net/publication/402554537_Caesar_Deep_Agentic_Web_Exploration_for_Creative_Answer_Synthesis"><img alt="ResearchGate" src="https://img.shields.io/badge/ResearchGate-Caesar-00CCBB?logo=researchgate&logoColor=white"></a>
  <a href="LICENSE"><img alt="License" src="https://img.shields.io/badge/license-Apache_2.0-blue"></a>
  <a href="https://github.com/cognizant-ai-labs/caesar-agent/commits/main"><img alt="Last commit" src="https://img.shields.io/github/last-commit/cognizant-ai-labs/caesar-agent"></a>
</p>

**Caesar** is an autonomous AI research agent that navigates the web, reasons over a dynamic knowledge graph, and synthesizes novel, grounded answers. In blinded LLM-as-a-Judge creativity evaluations, Caesar scored **25.29 / 30**, beating the runner-up (Gemini 3 Deep Research, 22.27) by **3.02 points**, and outscoring GPT-5.2 Deep Research (15.40) by nearly **10 points**. Statistically significant at **p < 0.001**.

If you're looking for an **agentic RAG system that goes beyond retrieval** (graph-based exploration, adversarial verification, and multi-draft synthesis), this is it.

> 📄 **Read the paper:** [*Caesar: Deep Agentic Web Exploration for Creative Answer Synthesis*](https://arxiv.org/abs/2604.20855) (Liang, Meyerson, Miikkulainen, 2026) · [DOI: 10.48550/arXiv.2604.20855](https://doi.org/10.48550/arXiv.2604.20855) · [PDF](https://arxiv.org/abs/2604.20855)

## Quickstart

**From PyPI:**

```bash
pip install caesar-agent
export OPENAI_API_KEY=your_key
caesar regular -q "your research question"
```

**From source:**

```bash
git clone https://github.com/cognizant-ai-lab/caesar-agent.git
cd caesar-agent && pip install -e .
export OPENAI_API_KEY=your_key
python caesar/run_agent.py regular -q "your research question"
```

**Setup notes:**

- **Presets** — `nano` (fast, ~$0.30), `mini` (balanced, ~$1), `regular` (deep, ~$3)
- **Output** — `~/.caesar/result/` when installed from PyPI; `caesar/result/` when run from a source checkout. Override either with `CAESAR_RESULT_DIR`.
- **Optional API keys** — `BRAVE_API_KEY` (higher-quality web search), `ANTHROPIC_API_KEY` (Claude), `GOOGLE_API_KEY` (Gemini)
- **More** — see [`caesar/README.md`](caesar/README.md) for the full env-var list, custom configs in `~/.caesar/configs/`, and the `pygraphviz` / system graphviz dependency.

For detailed configuration, exploration modes, synthesis options, and advanced usage, see the **[Caesar module docs](caesar/README.md)**.

Prefer a browser? The **[Caesar Web Server](web_server/README.md)** ships a FastAPI + Next.js GUI that submits runs, streams progress, and renders the live knowledge graph and final artifact — `cd web_server && ./launch.sh` and open `http://localhost:3000`.

## What It's Good For

Caesar shines on **open-ended, creative, cross-disciplinary** research, where retrieval alone won't work:

- **Hypothesis generation**: novel cross-domain connections (e.g., bridging materials science and biology)
- **Literature synthesis**: graph-grounded review that spots tensions and gaps between papers
- **Competitive intelligence**: deep mapping of a technical or market landscape
- **Counterfactual & meta-creative reasoning**: "what if X was different?" style inquiry
- **Novel solution ideation**: e.g., ARC-AGI–style problem exploration

It's **not** the right tool for quick factual lookups or latency-sensitive apps. Caesar is designed for depth, not speed.

## Why Caesar?

Current deep-research agents (ChatGPT Deep Research, Perplexity, GPT Researcher, Gemini Deep Research) optimize for **retrieval precision over a flat sequence of documents**. They produce competent summaries but suffer from **navigational amnesia**, fall into local minima, and generate derivative, consensus-driven outputs.

Caesar is different:

| Capability | Caesar | ChatGPT Deep Research | Perplexity | GPT Researcher |
|---|:-:|:-:|:-:|:-:|
| Open source | ✅ | ❌ | ❌ | ✅ |
| Dynamic knowledge graph | ✅ | ❌ | ❌ | ❌ |
| Adversarial Generator–Verifier loop | ✅ | ❌ | ❌ | ❌ |
| Multi-draft synthesis with merge | ✅ | ❌ | ❌ | Partial¹ |
| Episodic memory + backtracking | ✅ | ❌ | ❌ | ❌ |
| Pluggable LLM backend (OpenAI / Anthropic / local) | ✅ | ❌ | ❌ | ✅ |
| Reproducible experiment JSON | ✅ | ❌ | ❌ | Partial² |

<sub>¹ GPT Researcher supports multi-draft generation but not adversarial self-critique or merge. ² GPT Researcher logs cost per run but not the full reproducibility bundle (wall-time, page-level sources, draft provenance).</sub>

## How It Works

Caesar operates in two cognitive phases:

### 1. Deep Web Exploration: stateful graph traversal

A recursive **Perceive–Think–Act** loop performs topological traversal of information spaces. Rather than isolating summaries, Caesar generates context-aware insights conditioned on the **local structure of the exploration graph**, analyzing how new content builds upon or contradicts neighboring nodes. A dynamic policy, informed by a vector knowledge base and episodic memory, autonomously switches between depth-first expansion, strategic backtracking, and targeted web search.

### 2. Adversarial Artifact Synthesis: Generator–Verifier loop

Rather than a single-pass summary, Caesar runs as a recursive self-correction environment. An independent adversarial module formulates **orthogonal queries** targeting logical weaknesses, missing citations, and contradictions in the current belief state. Multiple drafts are produced iteratively and merged, forcing the agent out of the consensus basin that traps single-pass LLMs.

## Architectural Innovations

- **Domain-Specific Role Adaptation**: the agent rewrites its own system prompt per task, overriding the safety-biased generic responses typical of RLHF models.
- **Graph-Augmented Insight Generation**: insights are conditioned on the exploration graph neighborhood, enabling online associative reasoning.
- **Knowledge-Guided Exploration Policy**: detects navigational stagnation via episodic memory and forces backtracking.
- **Adversarial Query Refinement**: orthogonal queries push the agent out of generic LLM consensus toward novel, grounded facts.

## Benchmark Results

Evaluated with a blinded **3-model LLM-as-a-Judge panel** (Claude Sonnet 4.5, GPT-5.2, Gemini 3 Pro) across three creativity dimensions (**New**, **Useful**, **Surprising**), scored 0–10 each:

| Agent | New | Useful | Surprising | **Total** |
|---|:-:|:-:|:-:|:-:|
| **Caesar** | **8.64** | **8.38** | **8.27** | **25.29** |
| Gemini 3 Deep Research | 7.69 | 7.09 | 7.49 | 22.27 |
| Sonnet 4.5 Deep Research | 6.96 | 7.20 | 6.73 | 20.89 |
| GPT-5.2 Deep Research | 5.02 | 6.02 | 4.36 | 15.40 |

Mann–Whitney U, **p < 0.001** across all settings. Ablations confirm both graph exploration and the adversarial verifier loop are independently necessary. See the [paper](https://arxiv.org/abs/2604.20855) for full methodology, exploration-budget ablation, and judge bias analysis.

## Example Output

After a run, Caesar writes a full artifact (abstract + body with citations) plus a structured summary:

```json
{
  "wall_time": 591.31,
  "tokens_used": 109873,
  "token_cost": 0.29,
  "api_calls": 20,
  "webpages_visited": 4,
  "iterations_elapsed": 5,
  "artifact_dir": "result/.../agent_CaesarExplorer.synthesis.04161850",
  "num_drafts": 2,
  "config_summary": { "..." : "..." }
}
```

The `artifact_dir` contains one `.txt` per synthesis draft, a final merged artifact, and a metadata file tracking sources cited in each draft. Knowledge graphs are saved as compressed JSON checkpoints for reproducibility or post-hoc analysis.

## Built on Rome

Caesar is built on **Rome**, a Finite State Machine framework for stateful AI agents that provides Generator–Verifier–Reviser topologies, episodic memory, dynamic policy routing, and verifiable code execution. See the [Rome framework docs](rome/README.md) if you want to build your own agent on top.

## Project Layout

```
caesar-agent/
├── caesar/          # Caesar agent (see caesar/README.md for full usage)
│   ├── caesar_agent.py
│   ├── artifact_synthesis.py
│   ├── run_agent.py
│   ├── config/      # YAML configs and creativity benchmarks
│   └── paper/       # Caesar paper (PDF)
├── rome/            # Rome framework (see rome/README.md): FSM, memory, LLM handlers, KB client
└── web_server/      # FastAPI + Next.js web GUI (see web_server/README.md)
```

## FAQ

**How is this different from LangGraph / CrewAI / AutoGen?**
Those are orchestration frameworks: they help you wire up agents. Rome is an opinionated runtime for *how* agents should reason (graph-structured exploration, adversarial verification, episodic memory). Caesar is a concrete research agent built on it.

**Do I need GPUs?**
No. Caesar uses hosted LLM APIs (OpenAI, Anthropic). A local ChromaDB instance handles the vector store. Runs on a laptop.

**Which models are supported?**
OpenAI (GPT-5 family, o-series reasoning models), Anthropic (Claude 4.5 / 4.6 / 4.7), and any OpenAI-compatible endpoint. Model selection is per-subsystem (exploration, synthesis, judging) via YAML config.

**How much does a typical run cost?**
A 5-iteration exploration with Claude Haiku 4.5 runs at roughly $0.30 and 10 minutes. A 250-iteration deep run with GPT-5.4-mini is typically $5–$10.

**Can I reproduce the benchmarks?**
Yes. Configs, judge rubrics, and evaluation scripts are in `caesar/config/` and `caesar/analysis/`.

## Contributing & Community

- ⭐ **Star the repo** if Caesar is useful for your research
- 🍴 **[Why fork Caesar?](https://github.com/cognizant-ai-lab/caesar-agent/issues/4)** — concrete starting points and good-first-issues for contributors
- 💬 **[Open a Discussion](https://github.com/cognizant-ai-lab/caesar-agent/discussions)** for ideas, questions, or use cases
- 🐛 **[File an Issue](https://github.com/cognizant-ai-lab/caesar-agent/issues)** for bugs or feature requests
- 🔧 **PRs welcome**, especially new exploration policies, synthesis strategies, and benchmark domains

### Good places to start a fork

| Goal | Issue | Effort |
|---|---|:---:|
| Adapt Caesar for your research domain (legal, biology, finance, …) | [#3 — Domain config preset](https://github.com/cognizant-ai-lab/caesar-agent/issues/3) | 1–2 hrs |
| Add a new web-search backend (Tavily, Exa, Serper, …) | [#1 — New search backend](https://github.com/cognizant-ai-lab/caesar-agent/issues/1) | 2–3 hrs |
| Experiment with multi-agent synthesis (ring or debate merge) | [#2 — Ring-merge strategy](https://github.com/cognizant-ai-lab/caesar-agent/issues/2) | 4–6 hrs |

If you fork Caesar for your own work, drop a comment on [issue #4](https://github.com/cognizant-ai-lab/caesar-agent/issues/4) — we'd love to see what you build.

## Citation

If you use Caesar in your research, please cite:

```bibtex
@misc{liang26caesar,
  title={Caesar: Deep Agentic Web Exploration for Creative Answer Synthesis}, 
  author={Jason Liang and Elliot Meyerson and Risto Miikkulainen},
  year={2026},
  eprint={2604.20855},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2604.20855}, 
}
```

## License

Apache License 2.0. See [LICENSE](LICENSE) and [NOTICE](NOTICE).
