Metadata-Version: 2.4
Name: clip-hier
Version: 0.4.0
Summary: CLIP-based zero-shot image classification tool
Author: charles322
License: MIT
License-File: LICENSE
Keywords: clip,image-classification,open-clip,zero-shot
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Requires-Python: >=3.8
Provides-Extra: inference
Requires-Dist: open-clip-torch; extra == 'inference'
Requires-Dist: pillow; extra == 'inference'
Requires-Dist: torch; extra == 'inference'
Requires-Dist: tqdm; extra == 'inference'
Description-Content-Type: text/markdown

# clip-hier

A [CLIP](https://github.com/mlfoundations/open_clip)-based **zero-shot image classification** tool: given a set of images and candidate class names, it returns the best-matching class for each image — no training required.

## Installation

```bash
pip install "clip-hier[inference]"
```

The `[inference]` extra also installs `torch` / `open_clip_torch` / `pillow`.

## Usage

```python
from clip_hier import classify

results = classify(
    image_paths=["cat.jpg", "dog.jpg"],
    classnames=["a cat", "a dog", "a bird"],
    model_name="ViT-B-32",
    pretrained="openai",
)

for r in results:
    print(r["path"], "->", r["label"], f"{r['score']:.3f}")
# cat.jpg -> a cat 0.312
# dog.jpg -> a dog 0.298
```

### Parameters

| Parameter | Description |
| --- | --- |
| `image_paths` | List of image paths |
| `classnames` | Candidate class names (natural language) |
| `model_name` / `pretrained` | open_clip model and pretrained-weights identifiers, default `ViT-B-32` / `openai` |
| `device` | `"cuda"` / `"cpu"`, auto-selected by default |
| `templates` | Prompt templates (each containing one `{}`); averaged when multiple are given. Default `"a photo of a {}."` |
| `batch_size` | Image-encoding batch size, default 32 |

Returns `[{"path", "label", "score"}, ...]`, where `score` is the cosine similarity.

## License

MIT
