Metadata-Version: 2.4
Name: harbor-vm
Version: 0.1.2
Summary: A Harbor distribution focused on VM-backed agent evaluation for storage, distributed filesystem, and network systems.
Author: Alex Shaw
Author-email: Alex Shaw <alexgshaw64@gmail.com>
License-Expression: Apache-2.0
License-File: LICENSE
Requires-Dist: pydantic>=2.11.7
Requires-Dist: shortuuid>=1.0.13
Requires-Dist: typer>=0.16.0
Requires-Dist: requests>=2.32.4
Requires-Dist: pyyaml>=6.0.2
Requires-Dist: rich>=14.1.0
Requires-Dist: toml>=0.10.2
Requires-Dist: tenacity>=9.1.2
Requires-Dist: python-dotenv>=1.1.1
Requires-Dist: litellm>=1.80.8
Requires-Dist: jinja2>=3.1.6
Requires-Dist: dirhash>=0.5.0
Requires-Dist: dockerfile-parse>=2.0.1
Requires-Dist: e2b>=2.4.2
Requires-Dist: datasets>=4.4.1
Requires-Dist: runloop-api-client>=1.2.0
Requires-Dist: daytona>=0.121.0
Requires-Dist: kubernetes>=32.0.0
Requires-Dist: claude-agent-sdk>=0.1.17
Requires-Dist: packaging>=25.0
Requires-Dist: fastapi>=0.128.0
Requires-Dist: uvicorn>=0.38.0
Requires-Dist: modal>=1.4.0
Requires-Dist: ruff>=0.13.0
Requires-Dist: pathspec>=1.0.3
Requires-Dist: supabase>=2.28.2
Requires-Dist: libvirt-python>=10.0.0
Requires-Dist: paramiko>=3.4.0
Requires-Dist: tinker>=0.14.0 ; extra == 'tinker'
Requires-Dist: tinker-cookbook>=0.1.0 ; extra == 'tinker'
Requires-Python: >=3.12
Provides-Extra: tinker
Description-Content-Type: text/markdown

# Harbor VM

 [![](https://dcbadge.limes.pink/api/server/https://discord.gg/6xWPKhGDbA)](https://discord.gg/6xWPKhGDbA)
[![Docs](https://img.shields.io/badge/Docs-000000?style=for-the-badge&logo=mdbook&color=105864)](https://harborframework.com/docs)
[![Cookbook](https://img.shields.io/badge/Cookbook-000000?style=for-the-badge&logo=mdbook&color=105864)](https://github.com/harbor-framework/harbor-cookbook)

Harbor VM is a development-focused Harbor branch for workloads that need real virtual machines instead of containers.

This distribution is intended for agent evaluation in systems domains where Docker is not enough, especially:

- storage systems
- distributed filesystems
- block-device and kernel-module workflows
- multi-node networked systems
- cluster recovery and operations tasks that need libvirt/KVM

The goal of this branch is to make VM-backed evaluation the default experience for teams building and debugging benchmarks such as Ceph, kernel/block-device, and multi-node infrastructure tasks.

Harbor VM keeps the Harbor CLI and task model, but emphasizes:

- `vm` and `vm-cluster` environments
- libvirt/KVM-based local execution
- tasks that need kernel-level access or multiple virtual machines
- development workflows for authoring and debugging VM-backed benchmarks

Check out the [Harbor Cookbook](https://github.com/harbor-framework/harbor-cookbook) for end-to-end examples and guides.

## Installation

```bash tab="uv"
uv tool install harbor-vm
```
or
```bash tab="pip"
pip install harbor-vm
```

Harbor VM should be treated as a Linux-only distribution in practice. It is meant for hosts with libvirt/KVM available.

The published package installs `libvirt-python` as a core dependency, so installation is expected to work on Linux hosts with libvirt development tooling available. macOS and Windows are not supported installation targets for `harbor-vm`.

Host prerequisites typically include:

```bash
sudo apt install qemu-kvm libvirt-daemon-system libvirt-clients libguestfs-tools pkg-config
```

You also need:

- `/dev/kvm`
- `virsh`
- an ISO creation tool such as `genisoimage` or `mkisofs`

After installation, the CLI entrypoints are:

```bash
harbor-vm --help
hbvm --help
```

## Example: Running Terminal-Bench-2.0
Harbor VM can still run standard Harbor workloads, but its main focus is VM-backed tasks.

For a standard benchmark run:

```bash 
export ANTHROPIC_API_KEY=<YOUR-KEY> 
harbor-vm run --dataset terminal-bench@2.0 \
   --agent claude-code \
   --model anthropic/claude-opus-4-1 \
   --n-concurrent 4 
```

This will launch the benchmark locally using Docker. To run it on a cloud provider (like Daytona) pass the `--env` flag as below:

```bash 

export ANTHROPIC_API_KEY=<YOUR-KEY> 
export DAYTONA_API_KEY=<YOUR-KEY>
harbor-vm run --dataset terminal-bench@2.0 \
   --agent claude-code \
   --model anthropic/claude-opus-4-1 \
   --n-concurrent 100 \
   --env daytona
```

To see all supported agents, and other options run:

```bash
harbor-vm run --help
```

To explore all supported third party benchmarks (like SWE-Bench and Aider Polyglot) run:

```bash
harbor-vm datasets list
```

To evaluate an agent and model one of these datasets, you can use the following command:

```bash
harbor-vm run -d "<dataset@version>" -m "<model>" -a "<agent>"
```

## Why This Branch Exists

The upstream Harbor project supports many environment backends. This branch exists because some benchmark authors need VM support to be the primary development path rather than an optional extra.

Typical examples:

- Ceph cluster bring-up and recovery
- distributed filesystem fault injection
- tasks that need real block devices, loop devices, LVM, or kernel modules
- routing, bridging, or multi-node networking exercises
- storage admin and SRE tasks that must run in a realistic guest OS

If your task can be expressed cleanly in Docker, upstream Harbor is usually enough. If your task needs KVM guests, multi-node topologies, or privileged kernel interactions, Harbor VM is the intended distribution.

## VM Environments (libvirt/KVM)

Harbor VM is built around the **VM environment backend** powered by libvirt/KVM.

### When to use it

| Need | Use |
|------|-----|
| Standard software tasks | `type = "docker"` (default) |
| Kernel modules / block devices | `type = "vm"` |
| Multi-node cluster (e.g. Ceph, etcd) | `type = "vm-cluster"` |

### Single-node VM task

```toml
# task.toml
[environment]
type = "vm"
base_image = "ubuntu-24.04"
cpus = 4
memory_mb = 8192
storage_mb = 20480
```

Add a `vm-setup.sh` script to `environment/` to install packages. In VM tasks this replaces the Dockerfile-style setup step:

```bash
#!/bin/bash
apt-get install -y qemu-utils nbd-client
```

### Multi-node cluster task

```toml
# task.toml
[environment]
type = "vm-cluster"
network = "192.168.100.0/24"

[[environment.nodes]]
name = "mon"
cpus = 2
memory_mb = 4096
roles = ["ceph-mon", "ceph-mgr"]

[[environment.nodes]]
name = "osd0"
cpus = 2
memory_mb = 4096
storage_mb = 20480
roles = ["ceph-osd"]
```

The agent connects to the primary (first) node via SSH. All nodes can reach each other by hostname over the virtual bridge network.

See [`examples/tasks/vm-single-node/`](examples/tasks/vm-single-node/) and [`examples/tasks/vm-cluster/`](examples/tasks/vm-cluster/) for complete examples.

## Packaging

This repository is packaged as `harbor-vm`.

Build locally with:

```bash
uv build
```

Install the built wheel as a tool with:

```bash
uv tool install dist/harbor_vm-*.whl
```

The import package remains `harbor`, while the published distribution name is `harbor-vm`.

## Citation

If you use **Harbor VM** in academic work, please cite it using the “Cite this repository” button on GitHub or adapt the following BibTeX entry:

```bibtex
@software{Harbor_Framework,
author = {{Harbor Framework Team}},
month = jan,
title = {{Harbor: A framework for evaluating and optimizing agents and models in container environments}},
url = {https://github.com/harbor-framework/harbor},
year = {2026}
}
```
