Metadata-Version: 2.4
Name: tda-repr
Version: 0.1.0
Summary: Topological and spectral analysis of neural representations.
Author: tda-repr contributors
License-Expression: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: matplotlib
Requires-Dist: torch
Requires-Dist: torchvision
Requires-Dist: tqdm
Requires-Dist: InquirerPy>=0.3.4
Provides-Extra: nlp
Requires-Dist: transformers; extra == "nlp"
Requires-Dist: datasets; extra == "nlp"
Requires-Dist: tokenizers; extra == "nlp"
Requires-Dist: sacrebleu; extra == "nlp"
Provides-Extra: topology
Requires-Dist: gudhi; extra == "topology"
Requires-Dist: ripser; extra == "topology"
Provides-Extra: medical
Requires-Dist: medmnist; extra == "medical"
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: ruff>=0.6.0; extra == "dev"
Requires-Dist: pre-commit>=3.8.0; extra == "dev"
Provides-Extra: all
Requires-Dist: transformers; extra == "all"
Requires-Dist: datasets; extra == "all"
Requires-Dist: tokenizers; extra == "all"
Requires-Dist: sacrebleu; extra == "all"
Requires-Dist: gudhi; extra == "all"
Requires-Dist: ripser; extra == "all"
Requires-Dist: medmnist; extra == "all"
Requires-Dist: pytest>=8; extra == "all"
Requires-Dist: build; extra == "all"
Requires-Dist: twine; extra == "all"
Requires-Dist: ruff>=0.6.0; extra == "all"
Requires-Dist: pre-commit>=3.8.0; extra == "all"
Dynamic: license-file

# tda-repr

Topological and spectral analysis toolkit for neural network representations.

`tda-repr` helps you monitor hidden-layer geometry during training and compare it
to benchmark quality metrics (loss/accuracy/F1), with reproducible logs and plots.

## Key Features

- Track layer representations on train/val/test splits.
- Compute graph/Hodge/persistent characteristics per layer and epoch.
- Compute MTopDiv distance between stages (for example, train vs val).
- Run interactive experiments from terminal (TUI) or non-interactive CLI.
- Export run artifacts: JSONL logs, progress figures, correlation report.

## Installation

### Editable install (development)

```bash
pip install -e .
```

### Install with all optional extras

```bash
pip install -e .[all]
```

### Main extras

- `.[nlp]` for text datasets/models (`transformers`, `datasets`)
- `.[topology]` for advanced topology backends (`gudhi`, `ripser`)
- `.[medical]` for MedMNIST support
- `.[dev]` for tests/lint/release tooling

## Quick Start

### 1) Inspect available layers

```bash
tda-repr-layers --model efficientnet_b0 --leaf_only
```

Strict include validation:

```bash
tda-repr-layers --model efficientnet_b0 --include "features.0.0,features.7,avgpool,classifier.1" --strict
```

### 2) Run an interactive experiment (TUI)

```bash
python tools/run_experiment.py --interactive --interactive_ui tui
```

In TUI you can:
- select task/dataset/model/device;
- choose monitor layers with readable names;
- choose benchmark metrics;
- configure fine-tune mode and early-stop signals;
- go back to previous steps (`b` where supported).

### 3) Run non-interactive experiment (CLI)

```bash
python tools/run_experiment.py \
  --no-interactive \
  --task cv \
  --dataset cifar10 \
  --model resnet18 \
  --device cuda:0 \
  --finetune full \
  --epochs 20 \
  --batch_size 256 \
  --download \
  --layer_include "conv1,layer1.*,layer2.*,layer3.*,layer4.*,avgpool,fc"
```

## Common Recipes

### Recipe A: Fine-tune only selected modules

Useful when full fine-tune is too expensive or noisy.

```bash
python tools/run_experiment.py \
  --no-interactive \
  --task cv \
  --dataset imagenette \
  --model efficientnet_b0 \
  --device cuda:0 \
  --finetune selected_layers \
  --train_layers "features.0.0,features.7,features.8.0,avgpool,classifier.1" \
  --layer_include "features.0.0,features.7,features.8.0,avgpool,classifier.1" \
  --epochs 20 \
  --batch_size 256 \
  --download
```

### Recipe B: Parameter-pattern based fine-tune

```bash
python tools/run_experiment.py \
  --no-interactive \
  --task cv \
  --dataset cifar10 \
  --model resnet18 \
  --device cuda:0 \
  --finetune named_patterns \
  --train_patterns "layer4.*,fc.*" \
  --train_exclude_patterns "*.bn*" \
  --epochs 20 \
  --batch_size 256 \
  --download
```

### Recipe C: Multi-signal early-stop monitor (signal only)

`--early_stop` now emits an early-stop signal event and figure, but does **not**
terminate training automatically. This is intended for post-hoc analysis and fair
comparison against fixed-epoch training.

```bash
python tools/run_experiment.py \
  --no-interactive \
  --task cv \
  --dataset cifar10 \
  --model efficientnet_b0 \
  --device cuda:0 \
  --finetune full \
  --layer_include "features.0.0,features.7,features.8.0,avgpool" \
  --early_stop \
  --early_stop_signals "features.0.0:mtopdiv_train_val:max;features.7:mtopdiv_train_val:max;features.8.0:beta1_persistent_est:max;avgpool:beta1_persistent_est:max" \
  --early_stop_aggregate any \
  --early_stop_patience 4 \
  --early_stop_start_epoch 3 \
  --epochs 20 \
  --batch_size 256 \
  --download
```

## Fine-Tune Modes

`--finetune` supports:

- `full` - all parameters trainable.
- `linear_probe` - classifier head only.
- `last_n_params` - unfreeze last N parameters (`--last_n_params`).
- `named_prefixes` - unfreeze parameters by name prefixes (`--train_prefixes`).
- `named_patterns` - unfreeze by glob patterns (`--train_patterns` / `--train_exclude_patterns`).
- `selected_layers` - unfreeze explicit module list (`--train_layers`).

If `--finetune_list` is provided (CSV), modes run sequentially in one command.

## Output Artifacts

Each run writes into `runs/<experiment_name>/`:

- `meta.json` - run setup and metadata
- `metrics.jsonl` - per-epoch structured logs
- `figures/fig_quality_progress.png` - benchmark metrics over epochs
- `figures/fig_repr_progress.png` - representation metrics over epochs
- `figures/fig_early_stop_metric.png` - early-stop signal curve (if enabled)

### Correlation report

```bash
python tools/correlation_report.py --run_dir runs/<your_run_dir>
```

Outputs:
- `all_pairs.csv`
- `top_pairs.csv`
- `top_pairs.png`

## Python API Example

Minimal layer-hook example:

```python
import torch
from tda_repr import LayerTaps, get_model_info

info = get_model_info("resnet18")
model = info.model.eval()
x = torch.rand(4, 3, 224, 224)

layer_names = ["layer1.0", "layer4.1", "fc"]
with LayerTaps(model, layer_names) as taps:
    _ = model(x)

for name, tensor in taps.outputs.items():
    print(name, tuple(tensor.shape))
```

## Release Checklist

```bash
pip install -e .[dev]
pytest
python -m build
python -m twine check dist/*
```

Optional smoke runs:

```bash
python -m tests.smoke_model_layers
python -m tests.smoke_train_monitor
```

## Notes

- Hardware-specific guidance: `docs/hardware_notes.md`
- Default package on PyPI includes only `tda_repr*` code (not `runs/`, `data/`, `tests/`, or dot-directories).
