Metadata-Version: 2.4
Name: door-stabilizer
Version: 0.4.1
Summary: Door: minimal adaptive control (act/update) with online surrogate and optional N(t) readout
Author: BioQuant
License: Proprietary
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.22
Provides-Extra: torch
Requires-Dist: torch>=2.0; extra == "torch"

# Door

**Door** is a small adaptive control library: you implement `plant.step(action) -> reward`, then run an incremental loop with **`Door.act()`** and **`Door.update(reward)`**. Under the hood it keeps an **online surrogate** (PyTorch MLP by default, ridge fallback if PyTorch is missing), scores candidate actions, applies a few **gradient refinement** steps on the best candidate (similar in spirit to a hybrid inner refine), and widens exploration when reward variability spikes.

**`door.run`** is a batch alternative: one rollout with volatility widening and an optional **N(t)**-based stability summary (`StabilizeResult.show()`).

## Install (local tree)

```bash
cd team-bioquant
pip install -e .
```

On PyPI: **`pip install door-stabilizer`** (import package is still **`door`**). Optional PyTorch: `pip install "door-stabilizer[torch]"` or locally `pip install -e ".[torch]"`.

## Minimal example

```python
import numpy as np
from door import Door

class Plant:
    def step(self, a):
        return float(-np.sum(np.asarray(a) ** 2))

p = Plant()
ctrl = Door(dim=2, action_low=-1.0, action_high=1.0, seed=0)
for _ in range(100):
    u = ctrl.act()
    r = p.step(u)
    ctrl.update(r)
```

## Benchmark vs CEM-restart (drifting quadratic)

On a simple **drifting quadratic** toy (2D action, random-walk target, 200 steps, 48 seeds), **Door** (default **ridge** surrogate; no extra deps) vs a sequential **CEM-restart** baseline with the **same number of** `plant.step` **calls**:

| Metric (48 seeds) | Door | CEM-restart |
|-------------------|------|-------------|
| Mean cumulative reward | ≈ **−39.55** | ≈ −46.32 |
| Paired mean (Door − CEM) | ≈ **+6.77** | — |
| Mean reward, last 50 steps | ≈ **−0.0013** | ≈ −0.10 |

**Head-to-head** cumulative reward per seed: Door wins on **~44%** of seeds but **wins on average** — CEM-restart has heavier tails (more catastrophic runs). That is a reasonable MVP story: **better average and terminal performance**, not “wins every seed.”

Reproduce:

```bash
python benchmark_door_vs_cem_restart.py --seeds 48
```

Optional **PyTorch MLP + 2 gradient refine steps** on the surrogate (closer to hybrid HAT spirit; slower):

```bash
python benchmark_door_vs_cem_restart.py --seeds 48 --torch --refine 2
```

## Trade-off (how to message it)

Door is tuned for **strong average performance and stability** across seeds, not for winning every random seed against a noisy baseline. That is a reasonable MVP bar.

## Research HAT

The full **HAT** stack (distilled transformer, JEPA world, challenge simulator) in this repo is separate; it can be bridged later as an optional checkpoint path. The **`HAT` name** is kept as an alias: `from door import Door, HAT`.
