Metadata-Version: 2.4
Name: eyehands
Version: 1.6.0
Summary: Give Claude Code eyes and hands on Windows — local HTTP server for screen capture, mouse, keyboard, OCR, and UI Automation. Pointer-lock compatible, multi-monitor, Per-Monitor DPI v2.
Author-email: Fireal <fireal6353@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/shameindemgg/eyehands
Project-URL: Repository, https://github.com/shameindemgg/eyehands
Project-URL: Issues, https://github.com/shameindemgg/eyehands/issues
Project-URL: Store, https://portal.fireal.dev
Keywords: claude,claude-code,claude-skill,claude-plugin,mcp,agent-skill,automation,rpa,windows,windows-automation,desktop-automation,screen-capture,screenshot,computer-use,ai-agent,ai-tools,llm-tools,mouse,keyboard,sendinput,ui-automation,uia,pyautogui,autohotkey,autoit,sikuli,ocr,easyocr,parsec,remote-desktop,dxgi,bettercam
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Win32 (MS Windows)
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Information Technology
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: Microsoft :: Windows :: Windows 10
Classifier: Operating System :: Microsoft :: Windows :: Windows 11
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Desktop Environment
Classifier: Topic :: Multimedia :: Graphics :: Capture :: Screen Capture
Classifier: Topic :: Office/Business
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Software Development :: User Interfaces
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mss>=9.0
Requires-Dist: numpy>=1.24
Requires-Dist: Pillow>=10.0
Provides-Extra: performance
Requires-Dist: bettercam>=1.0; extra == "performance"
Requires-Dist: simplejpeg>=1.7; extra == "performance"
Requires-Dist: opencv-python-headless>=4.8; extra == "performance"
Provides-Extra: ocr
Requires-Dist: easyocr>=1.7; extra == "ocr"
Provides-Extra: ui
Requires-Dist: uiautomation>=2.0; extra == "ui"
Provides-Extra: all
Requires-Dist: eyehands[ocr,performance,ui]; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Dynamic: license-file

# eyehands

**Give AI agents eyes and hands on Windows** -- a local HTTP server for screen capture, mouse control, and keyboard input.

eyehands runs on `localhost:7331` and exposes a simple REST API that any AI agent (Claude, GPT, local models) can call to see the screen, move the mouse, click, type, scroll, and find UI elements. Built for Windows with physical-pixel accuracy, multi-monitor support, and pointer-lock compatibility.

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Python 3.10+](https://img.shields.io/badge/Python-3.10%2B-blue.svg)](https://python.org)
[![Windows](https://img.shields.io/badge/Platform-Windows-0078D6.svg)](https://microsoft.com/windows)

---

## Quick Start

```bash
pip install eyehands
eyehands
```

The server starts on `http://localhost:7331`. (You can also run it via `python -m eyehands` if the console script isn't on your PATH.) Verify with:

```bash
curl http://localhost:7331/ping
```

Take a screenshot:

```bash
curl http://localhost:7331/screenshot --output screen.jpg
```

Click at coordinates (500, 300):

```bash
curl -X POST http://localhost:7331/move_absolute -d "{\"x\":500,\"y\":300}"
curl -X POST http://localhost:7331/click
```

---

## Installation

### From PyPI (recommended)

```bash
pip install eyehands                # core only: mss, numpy, Pillow
pip install "eyehands[performance]" # + bettercam, simplejpeg, opencv-python-headless
pip install "eyehands[ocr]"         # + easyocr (for /find endpoint)
pip install "eyehands[ui]"          # + uiautomation (for /ui/* endpoints, Pro)
pip install "eyehands[all]"         # everything
```

Then run with `eyehands` (console script) or `python -m eyehands`.

### From a git checkout

```bash
git clone https://github.com/shameindemgg/eyehands.git
cd eyehands
pip install -r requirements.txt     # or: pip install -e .
python server.py                    # dev shim; same as `python -m eyehands`
```

### Requirements

- **OS:** Windows 10/11
- **Python:** 3.10 or later
- **Display:** Any resolution, multi-monitor supported

---

## API Reference

All POST endpoints accept JSON bodies. All coordinates are physical screen pixels.

### Free Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/ping` | Health check |
| GET | `/cursor_pos` | Current cursor position `{x, y}` |
| GET | `/screenshot` | On-demand JPEG capture (full screen) |
| GET | `/screenshot_b64` | Screenshot as base64 JSON |
| GET | `/latest` | Latest frame from background buffer (JPEG) |
| GET | `/latest_b64` | Latest buffered frame as base64 JSON with metadata |
| GET | `/view` | Live HTML page with auto-refresh and click-through overlay |
| GET | `/find` | OCR text search -- returns screen coordinates of matched text |
| POST | `/move` | Relative mouse move `{dx, dy}` |
| POST | `/move_absolute` | Absolute mouse move `{x, y}` |
| POST | `/click` | Click `{button: "left"\|"right"\|"middle"}` |
| POST | `/double_click` | Double click `{button}` |
| POST | `/mousedown` | Press and hold mouse button `{button}` |
| POST | `/mouseup` | Release mouse button `{button}` |
| POST | `/scroll` | Scroll wheel `{dy}` (positive = up) |
| POST | `/key` | Keypress with optional modifiers `{vk, modifiers}` |
| POST | `/type_text` | Type unicode text `{text}` |

### Pro Endpoints

Available with a [Pro license](#pro-license) ($19, one-time).

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/smooth_move` | Smooth relative mouse move `{dx, dy, steps, delay_ms}` |
| POST | `/click_at` | Move to coordinates + click in one call `{x, y, button}` |
| POST | `/click_and_wait` | Click + wait for screen content to change |
| POST | `/type_and_enter` | Type text + press Enter in one call `{text}` |
| POST | `/batch` | Execute an array of actions sequentially |
| POST | `/ui/click_element` | Find a UI element by name/type and click it |
| GET | `/ui/windows` | List all top-level windows |
| GET | `/ui/find` | Search UI Automation elements by name, type, depth |
| GET | `/ui/at` | Get UI element at screen coordinates `{x, y}` |
| GET | `/ui/tree` | Get nested UI Automation element tree |

---

## Free vs Pro

| Feature | Free | Pro |
|---------|------|-----|
| Screen capture (screenshot, latest, view) | Yes | Yes |
| Mouse control (move, click, scroll) | Yes | Yes |
| Keyboard input (key, type_text) | Yes | Yes |
| OCR text search (/find) | Yes | Yes |
| Background frame buffer (20fps) | Yes | Yes |
| Composite actions (click_at, batch, type_and_enter) | -- | Yes |
| Smooth mouse movement | -- | Yes |
| Click and wait for screen change | -- | Yes |
| Windows UI Automation (/ui/*) | -- | Yes |
| UI element click by name | -- | Yes |
| **Price** | **Free** | **$19 one-time** |

---

## Pro License

Purchase a license key at **[portal.fireal.dev](https://portal.fireal.dev)** ($19, one-time payment).

### Activation

With the server running, send your license key to `/activate`. All endpoints except `/ping` require the bearer token printed at startup (also saved to `.eyehands-token`):

```bash
curl -X POST http://localhost:7331/activate \
  -H "Authorization: Bearer $(cat .eyehands-token)" \
  -H "Content-Type: application/json" \
  -d "{\"key\":\"YOUR-LICENSE-KEY\"}"
```

The license is tied to your machine and persists across restarts. No subscription, no recurring fees.

### Updates

The server checks for new versions at startup (cached for 6 hours) and surfaces them on `/ping` and `/latest_b64` so Claude can tell you when an upgrade is available. Two ways to upgrade:

```bash
# Manual: pip-based upgrade
pip install -U eyehands

# Automatic: server runs `pip install --upgrade eyehands` in a detached
# helper and relaunches via `python -m eyehands` when it's done
curl -X POST http://localhost:7331/update \
  -H "Authorization: Bearer $(cat .eyehands-token)"
```

If the auto-update fails for any reason, the previous version is still installed (we use `pip install --upgrade`, not `--force-reinstall`) and you can relaunch with `python server.py`. The helper writes pip output to `.update_log.txt` for debugging.

---

## How It Works

```mermaid
flowchart LR
    A["Claude Code<br/>(or any agent)"] -->|curl + JSON| B["eyehands HTTP<br/>127.0.0.1:7331"]
    B --> C["SendInput"]
    B --> D["FrameBuffer<br/>(DXGI / mss)"]
    B --> E["EasyOCR"]
    B --> F["UI Automation<br/>(uiautomation COM)"]
    C -->|Raw Input| G["Windows desktop<br/>(games, Parsec, apps)"]
    D -->|frame hash| E
    D -->|/latest JPEG| A
    F -->|tree walk| G
```

The agent talks JSON over HTTP. eyehands translates JSON into Win32 calls: `SendInput` for mouse and keyboard (Raw Input pipeline so pointer-lock apps see it), a background `FrameBuffer` thread for continuous screen capture, `EasyOCR` against the cached frame for text-based element lookup, and Windows UI Automation for accessibility-tree walks.

```
server.py          Main HTTP server -- all endpoints, input, and capture
find.py            Standalone OCR/color CLI tool
```

### Architecture

1. **Screen capture:** A background thread captures the primary monitor at ~20fps into a rolling buffer. The capture backend is auto-selected at startup: BetterCam (DXGI, ~120fps) > DXcam (~39fps) > mss (GDI fallback).

2. **Input:** All mouse and keyboard input uses the Windows `SendInput` API via ctypes. This goes through the Raw Input pipeline, making it compatible with pointer-lock applications (games, remote desktop, Parsec). Modifier keys are sent atomically in a single `SendInput` call.

3. **Multi-monitor:** Absolute mouse moves use `VIRTUALDESK` normalization (0--65535 range across all displays), so coordinates work correctly regardless of monitor arrangement.

4. **DPI awareness:** The process sets `Per-Monitor DPI Awareness v2` at startup. All coordinates are physical screen pixels -- no DPI scaling surprises.

5. **OCR:** The `/find` endpoint uses EasyOCR (lazy-loaded on first call) with frame-level caching -- repeated searches on an unchanged screen return instantly.

6. **UI Automation (Pro):** The `/ui/*` endpoints expose the Windows UI Automation tree, enabling agents to find and interact with native UI elements by name and control type rather than pixel coordinates.

7. **Single instance:** Only one server runs at a time, enforced via a PID file. Starting a new instance automatically kills the previous one.

### Using with Claude Code

eyehands ships with a [Claude Code skill](https://docs.anthropic.com/en/docs/claude-code/skills) (`SKILL.md`) that teaches Claude how to interact with the server. Install the skill to `~/.claude/skills/eyehands/` and Claude will automatically use OCR-based element location, UI Automation, and coordinate-based input to control your desktop.

---

## FAQ

### Does it work on macOS or Linux?

No — eyehands is Windows-only. It uses `SendInput`, Windows UI Automation, and Per-Monitor DPI v2, all of which are Windows-specific APIs. If you need cross-platform, look at [nut.js](https://nutjs.dev/), [PyAutoGUI](https://github.com/asweigart/pyautogui), or [SikuliX](https://sikulix.github.io/).

### Do I need Claude Code to use eyehands?

No. eyehands is a plain HTTP server — any agent or script that can make HTTP calls can drive it. The packaged `SKILL.md` is specific to Claude Code, but the server itself works from Cursor, a bash one-liner, a Python script, a local LLM harness, or whatever else you're running.

### Why is UI Automation behind the Pro tier?

The UIA endpoints (`/ui/*`) require the optional `uiautomation` dependency and a non-trivial amount of COM wrangling on the server side. Keeping them Pro funds the portal hosting and lets the core (screen capture + mouse + keyboard + OCR) stay free. The free tier is a legitimate tool on its own — most simple automation tasks work fine with `/find` + `/click_at`.

### Does it work with games?

Yes — that's actually one of the better use cases. Most Python automation libraries use `mouse_event`/`keybd_event`, which games and pointer-lock applications silently ignore because they subscribe to Raw Input instead of the legacy message queue. eyehands uses `SendInput` directly via ctypes, so synthetic mouse and keyboard events land in the same pipeline real hardware does. Games, Parsec remote desktop, and full-screen browsers all see eyehands input correctly.

### Does my screen data get sent anywhere?

No. Screen capture, OCR, and UIA all run locally on your machine. Screenshots never leave your PC unless *you* explicitly send one to a cloud LLM (for example, by feeding `/latest` into Claude's vision model). The only network traffic eyehands itself makes is:

- License validation (once at activation + at startup)
- Update checks (cached for 6 hours)
- Anonymous free-tier heartbeat (opt-out via `--no-telemetry`)

See the [privacy post](https://eyehands.fireal.dev/posts/ping-privacy-model) for full disclosure.

### Why a separate HTTP server instead of a Python library?

So it works from any language, so agent frameworks can share one process, and so EasyOCR's 3-second model load is paid once rather than on every call. The [local-HTTP-server-for-AI-agent-tools](https://eyehands.fireal.dev/posts/local-http-server-ai-tools) post goes into the full reasoning.

### I installed it but Claude Code isn't using it. What's wrong?

Run `eyehands --install-skill` once. This writes the `SKILL.md` into `~/.claude/skills/eyehands/` and bakes the token path into the file so Claude's curl calls authenticate correctly. Then restart your Claude Code session — it should pick up the skill on the next prompt.

### Can I run multiple eyehands servers on one machine?

No — only one instance at a time, enforced via a PID file. Starting a new instance will kill the previous one. If you need multiple environments (e.g. one in a Parsec session, one locally), the usual pattern is to reach for remote-desktop scenarios on a per-machine basis.

### Is there a way to see what eyehands is doing live?

Yes: open `http://127.0.0.1:7331/view?token=<your-token>` in a browser. You'll get a live HTML page showing the current frame at ~20 fps with click-through — you can click on the page itself and it'll forward the click through the server. Useful for debugging agent behavior in real time.

### How do I contribute?

Open an issue or PR on [github.com/shameindemgg/eyehands](https://github.com/shameindemgg/eyehands). Bugs, feature requests, and comparison-page corrections are all welcome. If you've used eyehands for something interesting, a PR adding it to `marketing/gallery.md` is a nice low-effort contribution.

---

## Comparisons

- [eyehands vs PyAutoGUI](marketing/comparisons/vs-pyautogui.md)
- [eyehands vs AutoHotkey](marketing/comparisons/vs-autohotkey.md)
- [eyehands vs AutoIt](marketing/comparisons/vs-autoit.md)
- [eyehands vs NirCmd](marketing/comparisons/vs-nircmd.md)
- [eyehands vs SikuliX](marketing/comparisons/vs-sikulix.md)
- [eyehands vs nut.js](marketing/comparisons/vs-nutjs.md)
- [eyehands vs Robot Framework](marketing/comparisons/vs-robot-framework.md)
- [eyehands vs just screenshotting with Claude](marketing/comparisons/vs-screenshots-only.md)

## Examples

- [`examples/qa_notepad.py`](examples/qa_notepad.py) — open Notepad, type, save via UIA
- [`examples/watch_ci_build.py`](examples/watch_ci_build.py) — frame-hash polling to wait for a build
- [`examples/find_window_by_uia.py`](examples/find_window_by_uia.py) — explore an unknown app's UIA tree
- [`examples/ocr_find_and_click.py`](examples/ocr_find_and_click.py) — free-tier OCR + click workflow
- [`examples/parsec_remote_drive.py`](examples/parsec_remote_drive.py) — drive a remote host via Parsec

## Support

- **Issues:** [github.com/shameindemgg/eyehands/issues](https://github.com/shameindemgg/eyehands/issues)
- **Store / Pro License:** [portal.fireal.dev](https://portal.fireal.dev)
- **Email:** fireal6353@gmail.com

---

## License

MIT -- see [LICENSE](LICENSE) for details.
