Metadata-Version: 2.4
Name: eyehands
Version: 1.2.4
Summary: Give Claude eyes and hands — screen capture + mouse/keyboard control on Windows
Author-email: Fireal <fireal6353@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/shameindemgg/eyehands
Project-URL: Repository, https://github.com/shameindemgg/eyehands
Project-URL: Issues, https://github.com/shameindemgg/eyehands/issues
Project-URL: Store, https://portal.fireal.dev
Keywords: claude,automation,windows,screen-capture,computer-use,ai-agent
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Win32 (MS Windows)
Classifier: Intended Audience :: Developers
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Desktop Environment
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mss>=9.0
Requires-Dist: numpy>=1.24
Requires-Dist: Pillow>=10.0
Provides-Extra: performance
Requires-Dist: bettercam>=1.0; extra == "performance"
Requires-Dist: simplejpeg>=1.7; extra == "performance"
Requires-Dist: opencv-python-headless>=4.8; extra == "performance"
Provides-Extra: ocr
Requires-Dist: easyocr>=1.7; extra == "ocr"
Provides-Extra: ui
Requires-Dist: uiautomation>=2.0; extra == "ui"
Provides-Extra: all
Requires-Dist: eyehands[ocr,performance,ui]; extra == "all"
Dynamic: license-file

# eyehands

**Give AI agents eyes and hands on Windows** -- a local HTTP server for screen capture, mouse control, and keyboard input.

eyehands runs on `localhost:7331` and exposes a simple REST API that any AI agent (Claude, GPT, local models) can call to see the screen, move the mouse, click, type, scroll, and find UI elements. Built for Windows with physical-pixel accuracy, multi-monitor support, and pointer-lock compatibility.

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Python 3.10+](https://img.shields.io/badge/Python-3.10%2B-blue.svg)](https://python.org)
[![Windows](https://img.shields.io/badge/Platform-Windows-0078D6.svg)](https://microsoft.com/windows)

---

## Quick Start

```bash
pip install eyehands
eyehands
```

The server starts on `http://localhost:7331`. (You can also run it via `python -m eyehands` if the console script isn't on your PATH.) Verify with:

```bash
curl http://localhost:7331/ping
```

Take a screenshot:

```bash
curl http://localhost:7331/screenshot --output screen.jpg
```

Click at coordinates (500, 300):

```bash
curl -X POST http://localhost:7331/move_absolute -d "{\"x\":500,\"y\":300}"
curl -X POST http://localhost:7331/click
```

---

## Installation

### From PyPI (recommended)

```bash
pip install eyehands                # core only: mss, numpy, Pillow
pip install "eyehands[performance]" # + bettercam, simplejpeg, opencv-python-headless
pip install "eyehands[ocr]"         # + easyocr (for /find endpoint)
pip install "eyehands[ui]"          # + uiautomation (for /ui/* endpoints, Pro)
pip install "eyehands[all]"         # everything
```

Then run with `eyehands` (console script) or `python -m eyehands`.

### From a git checkout

```bash
git clone https://github.com/shameindemgg/eyehands.git
cd eyehands
pip install -r requirements.txt     # or: pip install -e .
python server.py                    # dev shim; same as `python -m eyehands`
```

### Requirements

- **OS:** Windows 10/11
- **Python:** 3.10 or later
- **Display:** Any resolution, multi-monitor supported

---

## API Reference

All POST endpoints accept JSON bodies. All coordinates are physical screen pixels.

### Free Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/ping` | Health check |
| GET | `/cursor_pos` | Current cursor position `{x, y}` |
| GET | `/screenshot` | On-demand JPEG capture (full screen) |
| GET | `/screenshot_b64` | Screenshot as base64 JSON |
| GET | `/latest` | Latest frame from background buffer (JPEG) |
| GET | `/latest_b64` | Latest buffered frame as base64 JSON with metadata |
| GET | `/view` | Live HTML page with auto-refresh and click-through overlay |
| GET | `/find` | OCR text search -- returns screen coordinates of matched text |
| POST | `/move` | Relative mouse move `{dx, dy}` |
| POST | `/move_absolute` | Absolute mouse move `{x, y}` |
| POST | `/click` | Click `{button: "left"\|"right"\|"middle"}` |
| POST | `/double_click` | Double click `{button}` |
| POST | `/mousedown` | Press and hold mouse button `{button}` |
| POST | `/mouseup` | Release mouse button `{button}` |
| POST | `/scroll` | Scroll wheel `{dy}` (positive = up) |
| POST | `/key` | Keypress with optional modifiers `{vk, modifiers}` |
| POST | `/type_text` | Type unicode text `{text}` |

### Pro Endpoints

Available with a [Pro license](#pro-license) ($19, one-time).

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/smooth_move` | Smooth relative mouse move `{dx, dy, steps, delay_ms}` |
| POST | `/click_at` | Move to coordinates + click in one call `{x, y, button}` |
| POST | `/click_and_wait` | Click + wait for screen content to change |
| POST | `/type_and_enter` | Type text + press Enter in one call `{text}` |
| POST | `/batch` | Execute an array of actions sequentially |
| POST | `/ui/click_element` | Find a UI element by name/type and click it |
| GET | `/ui/windows` | List all top-level windows |
| GET | `/ui/find` | Search UI Automation elements by name, type, depth |
| GET | `/ui/at` | Get UI element at screen coordinates `{x, y}` |
| GET | `/ui/tree` | Get nested UI Automation element tree |

---

## Free vs Pro

| Feature | Free | Pro |
|---------|------|-----|
| Screen capture (screenshot, latest, view) | Yes | Yes |
| Mouse control (move, click, scroll) | Yes | Yes |
| Keyboard input (key, type_text) | Yes | Yes |
| OCR text search (/find) | Yes | Yes |
| Background frame buffer (20fps) | Yes | Yes |
| Composite actions (click_at, batch, type_and_enter) | -- | Yes |
| Smooth mouse movement | -- | Yes |
| Click and wait for screen change | -- | Yes |
| Windows UI Automation (/ui/*) | -- | Yes |
| UI element click by name | -- | Yes |
| **Price** | **Free** | **$19 one-time** |

---

## Pro License

Purchase a license key at **[portal.fireal.dev](https://portal.fireal.dev)** ($19, one-time payment).

### Activation

With the server running, send your license key to `/activate`. All endpoints except `/ping` require the bearer token printed at startup (also saved to `.eyehands-token`):

```bash
curl -X POST http://localhost:7331/activate \
  -H "Authorization: Bearer $(cat .eyehands-token)" \
  -H "Content-Type: application/json" \
  -d "{\"key\":\"YOUR-LICENSE-KEY\"}"
```

The license is tied to your machine and persists across restarts. No subscription, no recurring fees.

### Updates

The server checks for new versions at startup (cached for 6 hours) and surfaces them on `/ping` and `/latest_b64` so Claude can tell you when an upgrade is available. Two ways to upgrade:

```bash
# Manual: pip-based upgrade
pip install -U eyehands

# Automatic: server downloads the wheel from the portal and self-restarts
curl -X POST http://localhost:7331/update \
  -H "Authorization: Bearer $(cat .eyehands-token)"
```

If the auto-update fails for any reason, the previous version is still installed (we use `pip install --upgrade`, not `--force-reinstall`) and you can relaunch with `python server.py`. The helper writes pip output to `.update_log.txt` for debugging.

---

## How It Works

```
server.py          Main HTTP server -- all endpoints, input, and capture
find.py            Standalone OCR/color CLI tool
```

### Architecture

1. **Screen capture:** A background thread captures the primary monitor at ~20fps into a rolling buffer. The capture backend is auto-selected at startup: BetterCam (DXGI, ~120fps) > DXcam (~39fps) > mss (GDI fallback).

2. **Input:** All mouse and keyboard input uses the Windows `SendInput` API via ctypes. This goes through the Raw Input pipeline, making it compatible with pointer-lock applications (games, remote desktop, Parsec). Modifier keys are sent atomically in a single `SendInput` call.

3. **Multi-monitor:** Absolute mouse moves use `VIRTUALDESK` normalization (0--65535 range across all displays), so coordinates work correctly regardless of monitor arrangement.

4. **DPI awareness:** The process sets `Per-Monitor DPI Awareness v2` at startup. All coordinates are physical screen pixels -- no DPI scaling surprises.

5. **OCR:** The `/find` endpoint uses EasyOCR (lazy-loaded on first call) with frame-level caching -- repeated searches on an unchanged screen return instantly.

6. **UI Automation (Pro):** The `/ui/*` endpoints expose the Windows UI Automation tree, enabling agents to find and interact with native UI elements by name and control type rather than pixel coordinates.

7. **Single instance:** Only one server runs at a time, enforced via a PID file. Starting a new instance automatically kills the previous one.

### Using with Claude Code

eyehands ships with a [Claude Code skill](https://docs.anthropic.com/en/docs/claude-code/skills) (`SKILL.md`) that teaches Claude how to interact with the server. Install the skill to `~/.claude/skills/eyehands/` and Claude will automatically use OCR-based element location, UI Automation, and coordinate-based input to control your desktop.

---

## Support

- **Issues:** [github.com/shameindemgg/eyehands/issues](https://github.com/shameindemgg/eyehands/issues)
- **Store / Pro License:** [portal.fireal.dev](https://portal.fireal.dev)
- **Email:** fireal6353@gmail.com

---

## License

MIT -- see [LICENSE](LICENSE) for details.
