Metadata-Version: 2.4
Name: vink
Version: 0.1.0a1
Summary: Vector Incremental Nano Kit — a lightweight vector database with incremental inserts, automatic exact-to-ANN switching, and explicit storage management. Add vectors anytime without rebuilding the index
Author-email: speedyk_005 <speedy40115719@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/speedyk-005/vink
Project-URL: Repository, https://github.com/speedyk-005/vink
Keywords: vector-database,ann,approximate-nearest-neighbor,similarity-search,embedding-vectors,semantic-search,rag,retrieval-augmented-generation,llm,generative-ai,product-quantization,vector-search,ai,information-retrieval,knowledge-base
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pybind11<4.0,>=3.0
Requires-Dist: rii<1.0,>=0.2.12
Requires-Dist: readerwriterlock<2.0,>=1.0.9
Requires-Dist: pysqlite3<0.7,>=0.6
Requires-Dist: pydantic<3.0,>=2.12.2
Requires-Dist: uuid6>=2025.0.0; python_version < "3.14"
Requires-Dist: loguru<1.0,>=0.7.3
Provides-Extra: dev
Requires-Dist: pytest>=8.3.5; extra == "dev"
Requires-Dist: pytest-cov>=6.2.1; extra == "dev"
Requires-Dist: pytest-mock>=3.14.1; extra == "dev"
Requires-Dist: pytest-rerunfailures>=15.0; extra == "dev"
Requires-Dist: ruff>=0.14.14; extra == "dev"
Requires-Dist: python-docstring-markdown>=0.4.0; extra == "dev"
Dynamic: license-file

# 🐦 Vink

<p align="center">
  <img src="https://github.com/speedyk-005/vink/blob/main/vink_logo.png?raw=true" alt="Vink Logo" width="300"/>
</p>

<p align="center">
  <b>V</b>ector <b>In</b>cremental <b>N</b>ano <b>K</b>it
</p>
<p align="center">
  “Vector DB that self-organize. Auto-switch, Auto-tune, Auto-scale.”
</p>

[![Python Version](https://img.shields.io/badge/Python-3.9%20--%203.14-blue)](https://www.python.org/downloads/)
[![PyPI](https://img.shields.io/pypi/v/vink)](https://pypi.org/project/vink)
[![CodeFactor](https://www.codefactor.io/repository/github/speedyk-005/vink/badge)](https://www.codefactor.io/repository/github/speedyk-005/vink)
[![Coverage Status](https://coveralls.io/repos/github/speedyk-005/vink/badge.svg?branch=main)](https://coveralls.io/github/speedyk-005/vink?branch=main)
[![Stability](https://img.shields.io/badge/stability-pre--alpha-yellow)](https://github.com/speedyk-005/vink)
[![Tests](https://img.shields.io/badge/tests-passing-brightgreen)](https://github.com/speedyk-005/vink/actions)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

> [!WARNING]
> This project is currently in pre-alpha.

---

## Table of Contents

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*

- [🤔 So What's vink Anyway? (And Why Should You Care?)](#-so-whats-vink-anyway-and-why-should-you-care)
- [📦 Installation](#-installation)
  - [The Quick & Easy Way](#the-quick--easy-way)
  - [The From-Source Way](#the-from-source-way)
- [✅ Proof It Works](#-proof-it-works)
- [🚀 Usage](#-usage)
  - [Initialization (VinkDB API)](#initialization-vinkdb-api)
    - [AnnConfig (API)](#annconfig-api)
  - [Add (API)](#add-api)
    - [With embedding callback](#with-embedding-callback)
    - [Without callback](#without-callback)
  - [Search (API)](#search-api)
    - [Without filters](#without-filters)
    - [With filters](#with-filters)
  - [Delete](#delete)
    - [Soft deletion (API)](#soft-deletion-api)
    - [Compaction (API)](#compaction-api)
  - [Stats (API)](#stats-api)
- [🚨 Exceptions (API)](#-exceptions-api)
- [🗺 Features & Roadmap](#-features--roadmap)
- [🔧 Core Dependencies](#-core-dependencies)
- [🤝 Contributing](#-contributing)
- [📜 License](#-license)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

---

## 🤔 So What's vink Anyway? (And Why Should You Care?)

Most vector databases force a trade-off: you either over-engineer for small datasets or hit a performance cliff as you scale. You’re left babysitting indices, manually tuning parameters, and praying your hardware can keep up.

**Vink** eliminates the guesswork. It automatically switches from **Exact Search** (for 100% precision) to **ANN** (for massive scale with IVF-PQ) based on dataset size and runtime latency. Whether you are running on a mobile device or a high-end server, Vink adapts its optimization strategy to your hardware and data distribution.

| Feature | Why it's awesome |
| :--- | :--- |
| ➕ **Incremental Inserts** | Add vectors anytime. Your index grows with your data, not against it. |
| 📟 **Hardware-Aware Auto-Switch** | It figures out when to ditch exact search and switch to ANN based on latency prediction. |
| ⚙️ **Self-Tuning Engine** | Background reconfiguration keeps clusters fresh as your data evolves. |
| 🎯 **Production-Ready Search** | Filtered searches, soft deletes, compact, dual-metric (Euclidean + cosine). |
| 💾 **Explicit Storage** | Disk or memory — you control where your data lives. |

Unlike enterprise solutions (Milvus, Pinecone) that require complex Docker or cloud setup, Vink runs entirely local — zero dependencies beyond pip install.

And that's just the start - there's plenty more to explore!

---

## 📦 Installation

First ensure that you have the necessary system dependencies installed.

- **Linux only**:
  Required for building [rii](https://github.com/matsui528/rii)

  ```bash
  # Debian/Ubuntu
  sudo apt-get install python3-dev

  # RedHat/Fedora/CentOS
  sudo dnf install python3-devel -y

  # CentOS 7 and older
  sudo yum install python3-devel
  ```

- **Android/Termux**:

   ```bash
   pkg install -y tur-repo
   pkg install python-scipy
   ```

### The Quick & Easy Way

The simplest way to get started is with pip:

```bash
pip install vink
```

### The From-Source Way

Prefer building from source? You can clone and install manually for full control:

```bash
git clone https://github.com/speedyk-005/vink.git
cd vink
pip install -e .
```

(But honestly, the pip way is usually way easier!)

---

## ✅ Proof It Works

Run the demo to see auto-switch in action:

```bash
# Install and run anywhere
curl -O https://raw.githubusercontent.com/speedyk-005/vink/main/demo_poc.py
python demo_poc.py
```

The demo uses:
- `switch_latency_ms=120` (vs 300 default) — triggers switch sooner
- `dim=128`
- Batches of 10,000 vectors

The switch happens when latency exceeds `switch_latency_ms`. New vectors are buffered during the switch with zero downtime.

Results vary by hardware and system load — faster machines switch later, and running other programs will affect timing.

Example output:

```
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Vectors ┃      Strategy      ┃ Avg Query (ms) ┃ Insert Time (s) ┃     Status     ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ 10,000  │    exact_search    │     32.486     │      0.806      │  Exact Search  │
│ 20,000  │    exact_search    │     79.690     │      0.729      │  Exact Search  │
│ 30,000  │    exact_search    │    107.419     │      0.720      │  Exact Search  │
│ 40,000  │    exact_search    │    188.063     │      0.771      │  ⚙ Building ANN │
│ 50,000  │ approximate_search │     0.000      │     10.051      │  ✓ ANN Active  │
│ 60,000  │ approximate_search │    155.239     │      1.323      │  ✓ ANN Active  │
└─────────┴────────────────────┴────────────────┴─────────────────┴────────────────┘

✓ ANN switch successfully triggered!
```

---

## 🚀 Usage

### Initialization ([VinkDB API](https://github.com/speedyk-005/vink/blob/main/API_REFERENCES.md#vink-core-VinkDB))

```python
from vink import VinkDB

# Create a database with 128-dimensional vectors
db = VinkDB("./data", dim=128)

# Or use in-memory mode (no persistence)
db = VinkDB(":memory:", dim=128)

# With custom settings
db = VinkDB(
    dir_path="./data",
    dim=384,
    metric="euclidean",       # or "cosine" (default: euclidean)
    force_exact=False,         # or True to disable ANN (default: False)
    ann_config=None,           # ANNConfig for PQ/OPQ (default: auto-generated)
    switch_latency_ms=300,    # ms threshold for ANN switch (default: 300)
    embedding_callback=None,  # fn to generate embeddings from content
    overwrite=False,          # overwrite existing index (default: False)
    verbose=False              # enable verbose output (default: False)
)
```

#### AnnConfig ([API](https://github.com/speedyk-005/vink/blob/main/API_REFERENCES.md#vink-models-AnnConfig))

Want custom ANN settings?

```python
from vink import AnnConfig

config = AnnConfig(
    num_subspaces=16,        # number of sub-vectors (default: 32)
    quantizer="pq",           # "pq" or "opq" (default: pq)
    codebook_size=128,        # centroids per subspace (default: 256)
)
db = VinkDB("./data", dim=384, ann_config=config)

# print all available options:
AnnConfig.help()
```

### Add ([API](https://github.com/speedyk-005/vink/blob/main/API_REFERENCES.md#vink-core-VinkDB-add))

Records need:

- `content` (required): text to store
- `embedding` (required if no callback): list of floats or numpy array, shape `(d,)` or `(1, d)`
- `id` (optional): valid UUIDv7
- `metadata` (optional): dict of key-value pairs

Provide embeddings directly or use a callback to generate them on the fly.

#### With embedding callback

```python
db = VinkDB("./data", dim=384, embedding_callback=my_embedding_fn)

# Just provide content — embeddings generated automatically
db.add([
    {"content": "Hello world", "metadata": {"source": "doc1"}},
    {"content": "Another text"},
])
```

#### Without callback

Provide embeddings directly:

```python
db.add([
    {"content": "Hello world", "embedding": [0.1] * 384, "metadata": {"source": "doc1"}},
    {"content": "Another text", "embedding": [0.2] * 384}}
)]
```

### Search ([API](https://github.com/speedyk-005/vink/blob/main/API_REFERENCES.md#vink-core-VinkDB-search))

Results include:

- `id`: vector ID
- `content`: text content
- `distance`: similarity score (lower is closer for euclidean)
- `metadata`: key-value pairs
- `embedding`: (only if `include_vectors=True`)

#### Without filters
```python
# Basic search
results = db.search(query_vec=[0.1] * 384, top_k=5)

# Include embeddings in results
results = db.search(query_vec=[0.1] * 384, include_vectors=True)
```

#### With filters

Filter syntax supports `==`, `!=`, `>`, `<`, `>=`, `<=` with strings, numbers, and booleans. More operators coming in future updates.

```
results = db.search(
    query_vec=[0.1] * 384,
    top_k=10,
    filters=["source == 'doc1'", "score >= 50", "new == True"]
)
```

### Delete

#### Soft deletion ([API](https://github.com/speedyk-005/vink/blob/main/API_REFERENCES.md#vink-core-VinkDB-soft_delete))

Soft-delete vectors by ID without rebuilding the index — fast and efficient.

```python
# IDs come from search results or when adding
db.soft_delete(["0192a5b4-7f3c-7d6e-9a1b-2c3d4e5f6a7b", "0192a5b4-7f3c-7d6e-9a1b-2c3d4e5f6a7c"])
```

#### Compaction ([API](https://github.com/speedyk-005/vink/blob/main/API_REFERENCES.md#vink-core-VinkDB-compact))

Actually remove soft-deleted records and reclaim storage:

```python
db.compact()
```

> [!WARNING]
> Can take 20-200+ seconds with `approximate strategy` depending on data size. Run during maintenance windows or off-peak hours. If not enough vectors remain to retrain the codec, rebuild is skipped.

### Stats ([API](https://github.com/speedyk-005/vink/blob/main/API_REFERENCES.md#vink-core-VinkDB-stats))

Get database statistics:

```python
stats = db.stats()
# {
#     "version": "...",
#     "dimension": 128,
#     "metric": "euclidean",
#     "strategy": "exact_search",
#     "last_saved_at": "...",
#     "last_deleted_at": "...",
#     "active_count": 1000,
#     "deleted_count": 5
# }
```

---

## 🚨 Exceptions ([API](https://github.com/speedyk-005/vink/blob/main/API_REFERENCES.md#vink-exceptions))

Something go wrong?

| Exception | When it hits |
| :--- | :--- |
| `InvalidInputError` | Bad data or invalid params |
| `VectorDimensionError` | Embedding dim mismatch |
| `InvalidIdError` | Malformed UUIDv7 |
| `FilterError` | Bad filter syntax |

---

## 🗺 Features & Roadmap

- [x] Incremental Inserts
- [x] Hardware-Aware Auto-Switch
- [x] Soft deletes + compact
- [x] Save/Load
- [ ] Filter DSL
  - [x] basic filters: Quick Comparison
  - [ ] Complex Filters: Content Matching, Null Checks, date/time literals, ...
- [ ] Recovery: recover soft-deleted vectors
- [ ] Collections: Multi-collection support for managing multiple indices
- [ ] CLI - command-line interface
- [ ] REST API: HTTP API for remote vector operations
- [ ] Integrations: LangChain, LlamaIndex, and other integrations

---

## 🔧 Core Dependencies

- [rii](https://github.com/matsui528/rii) — C++ ANN library with pybind11 bindings (IVF-PQ index storage)
- [nanopq](https://github.com/matsui528/nanopq) — Pure Python PQ encoding/decoding
- [scipy](https://scipy.org) — Scientific computing (distance calculations)
- [numpy](https://numpy.org) — Numerical computing
- [SQLite](https://sqlite.org) — Metadata storage (content, embeddings, metadata), filtering queries

---

## 🤝 Contributing

Bug fixes, features, docs — all welcome. Check out [CONTRIBUTING.md](https://github.com/speedyk-005/vink/blob/main/CONTRIBUTING.md) for the full details.

---

## 📜 License

Check out the [LICENSE](https://github.com/speedyk-005/vink/blob/main/LICENSE) file for all the details.

> MIT License. Use freely, modify boldly, and credit appropriately! (We're not that legendary... yet 😉)

