Metadata-Version: 2.4
Name: rt-seg
Version: 0.1.0
Summary: rt_seg is a Python 3.12.x package for segmenting reasoning traces into coherent chunks and (optionally) assigning a label to each chunk.
Author: Leon Hammerla, Bhuvanesh Verma
License: MIT License
        
        Copyright (c) 2026 Leon Hammerla
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/LeonHammerla/RT-SEG
Project-URL: Issues, https://github.com/LeonHammerla/RT-SEG/issues
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: accelerate
Requires-Dist: aiohappyeyeballs
Requires-Dist: aiohttp
Requires-Dist: aiosignal
Requires-Dist: annotated-types
Requires-Dist: anyio
Requires-Dist: attrs
Requires-Dist: bertopic
Requires-Dist: blis
Requires-Dist: catalogue
Requires-Dist: certifi
Requires-Dist: charset-normalizer
Requires-Dist: click
Requires-Dist: cloudpathlib
Requires-Dist: confection
Requires-Dist: contourpy
Requires-Dist: cycler
Requires-Dist: cymem
Requires-Dist: datasets
Requires-Dist: dill
Requires-Dist: emoji
Requires-Dist: filelock
Requires-Dist: fonttools
Requires-Dist: frozenlist
Requires-Dist: fsspec
Requires-Dist: h11
Requires-Dist: hdbscan
Requires-Dist: hf-xet
Requires-Dist: httpcore
Requires-Dist: httpx
Requires-Dist: huggingface-hub
Requires-Dist: idna
Requires-Dist: iniconfig
Requires-Dist: Jinja2
Requires-Dist: joblib
Requires-Dist: kiwisolver
Requires-Dist: linkify-it-py
Requires-Dist: llvmlite
Requires-Dist: markdown-it-py
Requires-Dist: MarkupSafe
Requires-Dist: mdit-py-plugins
Requires-Dist: mdurl
Requires-Dist: mpmath
Requires-Dist: multidict
Requires-Dist: multiprocess
Requires-Dist: murmurhash
Requires-Dist: narwhals
Requires-Dist: networkx
Requires-Dist: nltk
Requires-Dist: numba
Requires-Dist: numpy
Requires-Dist: packaging
Requires-Dist: pandas
Requires-Dist: pillow
Requires-Dist: platformdirs
Requires-Dist: pluggy
Requires-Dist: preshed
Requires-Dist: propcache
Requires-Dist: protobuf
Requires-Dist: psutil
Requires-Dist: pyarrow
Requires-Dist: pydantic
Requires-Dist: pydantic_core
Requires-Dist: Pygments
Requires-Dist: pynndescent
Requires-Dist: pyparsing
Requires-Dist: python-dateutil
Requires-Dist: PyYAML
Requires-Dist: regex
Requires-Dist: requests
Requires-Dist: rich
Requires-Dist: safetensors
Requires-Dist: scikit-learn
Requires-Dist: scipy
Requires-Dist: sentence-transformers
Requires-Dist: six
Requires-Dist: smart_open
Requires-Dist: spacy
Requires-Dist: spacy-legacy
Requires-Dist: spacy-loggers
Requires-Dist: srsly
Requires-Dist: stanza
Requires-Dist: surrealdb
Requires-Dist: sympy
Requires-Dist: textual
Requires-Dist: thinc
Requires-Dist: threadpoolctl
Requires-Dist: tiktoken
Requires-Dist: tokenizers
Requires-Dist: torch
Requires-Dist: tqdm
Requires-Dist: transformers
Requires-Dist: typer-slim
Requires-Dist: typing-inspection
Requires-Dist: typing_extensions
Requires-Dist: uc-micro-py
Requires-Dist: umap-learn
Requires-Dist: urllib3
Requires-Dist: wasabi
Requires-Dist: weasel
Requires-Dist: websockets
Requires-Dist: wrapt
Requires-Dist: xxhash
Requires-Dist: yarl
Requires-Dist: alabaster
Provides-Extra: cuda
Requires-Dist: cuda-bindings; extra == "cuda"
Requires-Dist: cuda-pathfinder; extra == "cuda"
Requires-Dist: triton; extra == "cuda"
Requires-Dist: nvidia-cublas-cu12; extra == "cuda"
Requires-Dist: nvidia-cuda-cupti-cu12; extra == "cuda"
Requires-Dist: nvidia-cuda-nvrtc-cu12; extra == "cuda"
Requires-Dist: nvidia-cuda-runtime-cu12; extra == "cuda"
Requires-Dist: nvidia-cudnn-cu12; extra == "cuda"
Requires-Dist: nvidia-cufft-cu12; extra == "cuda"
Requires-Dist: nvidia-cufile-cu12; extra == "cuda"
Requires-Dist: nvidia-curand-cu12; extra == "cuda"
Requires-Dist: nvidia-cusolver-cu12; extra == "cuda"
Requires-Dist: nvidia-cusparse-cu12; extra == "cuda"
Requires-Dist: nvidia-cusparselt-cu12; extra == "cuda"
Requires-Dist: nvidia-nccl-cu12; extra == "cuda"
Requires-Dist: nvidia-nvjitlink-cu12; extra == "cuda"
Requires-Dist: nvidia-nvshmem-cu12; extra == "cuda"
Requires-Dist: nvidia-nvtx-cu12; extra == "cuda"
Provides-Extra: viz
Requires-Dist: matplotlib; extra == "viz"
Requires-Dist: plotly; extra == "viz"
Requires-Dist: seaborn; extra == "viz"
Requires-Dist: KDEpy; extra == "viz"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: wheel; extra == "dev"
Requires-Dist: setuptools; extra == "dev"
Requires-Dist: Cython; extra == "dev"
Dynamic: license-file

<p align="center">
  <img src="docs/assets/logo.svg" width="30%" style="max-width: 400px;">
</p>

# RT-SEG — Reasoning Trace Segmentation

`rt_seg` is a **Python 3.12.x** package for segmenting *reasoning traces* into coherent chunks and (optionally) assigning a label to each chunk.

The main entry point is:

```
RTSeg
```

(from `rt_segmentation.seg_factory`)

It orchestrates one or more **segmentation engines** and — if multiple engines are used — an **offset aligner** that fuses their boundaries into a single segmentation.

---

# Installation

## Install from PyPI (once published)

```bash
pip install rt_seg
```

## Development Install (repo checkout)

```bash
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

---

# Core Concepts

## What `RTSeg` Returns

`RTSeg(trace)` produces:

* `offsets`: `list[tuple[int, int]]` — character offsets into the trace
* `labels`: `list[str]` — one label per segment

You can reconstruct segments via:

```python
segments = [trace[s:e] for (s, e) in offsets]
```

---

## Segmentation Base Unit (`seg_base_unit`)

Most engines operate on a base segmentation first:

* `"clause"` (default) → finer granularity
* `"sent"` → coarser segmentation

---

# Quickstart — Single Engine

```python
from rt_seg import RTSeg
from rt_seg import RTRuleRegex

trace = "First step... Then second step... Finally conclude."

segmentor = RTSeg(
    engines=RTRuleRegex,
    seg_base_unit="clause",
)

offsets, labels = segmentor(trace)

for (s, e), label in zip(offsets, labels):
    print(label, "=>", trace[s:e])
```

---

# Multiple Engines + Late Fusion

If you pass multiple engines, you must provide an **aligner**.

```python
from rt_seg import RTSeg
from rt_seg import RTRuleRegex
from rt_seg import RTBERTopicSegmentation
from rt_seg import OffsetFusionGraph

segmentor = RTSeg(
    engines=[RTRuleRegex, RTBERTopicSegmentation],
    aligner=OffsetFusionGraph,
    label_fusion_type="concat",  # or "majority"
    seg_base_unit="clause",
)

offsets, labels = segmentor(trace)
```

## Label Fusion Modes

* `"majority"` — choose most frequent label
* `"concat"` — concatenate labels (useful for debugging)

---

# Available Engines

## Rule-Based

* `RTRuleRegex`
* `RTNewLine`

## Probabilistic

* `RTLLMForcedDecoderBased`
* `RTLLMSurprisal`
* `RTLLMEntropy`
* `RTLLMTopKShift`
* `RTLLMFlatnessBreak`

## LLM Discourse / Reasoning Schemas

* `RTLLMThoughtAnchor`
* `RTLLMReasoningFlow`
* `RTLLMArgument`

## LLM 

* `RTLLMOffsetBased`
* `RTLLMSegUnitBased`

## PRM-Based

* `RTPRMBase`
  
## Topic / Semantic / NLI

* `RTBERTopicSegmentation`
* `RTEmbeddingBasedSemanticShift`
* `RTEntailmentBasedSegmentation`
* `RTZeroShotSeqClassification`
* `RTZeroShotSeqClassificationRF`
* `RTZeroShotSeqClassificationTA`

---

# Engine Configuration

You can override engine parameters at call time:

```python
offsets, labels = segmentor(
    trace,
    model_name="Qwen/Qwen2.5-7B-Instruct",
    chunk_size=200,
)
```

---

# Available Aligners

* `OffsetFusionGraph`
* `OffsetFusionFuzzy`
* `OffsetFusionIntersect`
* `OffsetFusionMerge`
* `OffsetFusionVoting`

| Strategy               | Behavior               |
| ---------------------- | ---------------------- |
| Intersect              | Conservative           |
| Merge                  | Permissive             |
| Voting / Graph / Fuzzy | Balanced (recommended) |

---

# Implementing a Custom Engine

```python
from typing import Tuple, List
from rt_seg import SegBase

class MyEngine(SegBase):
    @staticmethod
    def _segment(trace: str, **kwargs) -> Tuple[List[tuple[int, int]], List[str]]:
        offsets = [(0, len(trace))]
        labels = ["UNK"]
        return offsets, labels
```

## Using Base Offsets

```python
base_offsets = SegBase.get_base_offsets(trace, seg_base_unit="clause")
```

---

# Implementing a Custom Aligner

```python
from typing import List, Tuple

class MyOffsetFusion:
    @staticmethod
    def fuse(engine_offsets: List[List[Tuple[int, int]]], **kwargs):
        return engine_offsets[0]
```

---

# Running the TUI (Without Docker)

```bash
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m tui
```

If needed:

```bash
python src/tui.py
```

---

# SurrealDB (Optional — Reproducible Experiments)

Required only for full experiment pipeline.

---

## 1️⃣ Start SurrealDB (Docker Recommended)

```bash
docker run --rm -it \
  -p 8000:8000 \
  -v "$(pwd)/data:/data" \
  surrealdb/surrealdb:latest \
  start --user root --pass root file:/data/surreal.db
```

Endpoints:

* WebSocket: `ws://127.0.0.1:8000/rpc`
* HTTP: `http://127.0.0.1:8000`

---

## 2️⃣ Import Database Snapshot

```bash
surreal import \
  --endpoint ws://127.0.0.1:8000/rpc \
  --username root \
  --password root \
  --namespace NR \
  --database RT \
  ./data/YOUR_EXPORT_FILE.surql
```

⚠️ Make sure namespace/database match your config.

---

## 3️⃣ Configure `data/sdb_login.json`

```json
{
  "user": "root",
  "pwd": "root",
  "ns": "NR",
  "db": "RT",
  "url": "ws://127.0.0.1:8000/rpc"
}
```

---

## 4️⃣ Run Experiment Scripts

```bash
python src/eval_main.py
python src/evo.py
```

---

# Docker + GPU Setup

## Requirements

* Linux
* NVIDIA GPU
* NVIDIA driver
* Docker
* NVIDIA Container Toolkit

Verify:

```bash
nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
```

---

## CUDA Compatibility Rule

Host driver CUDA ≥ Container CUDA

| Host | Container | Result |
| ---- | --------- | ------ |
| 12.8 | 12.4      | ✅      |
| 12.8 | 13.1      | ❌      |
| 13.x | 12.4      | ✅      |

---

## Build Image

```bash
docker build -f docker/Dockerfile -t rt-seg:gpu .
```

---

## Run

```bash
./run_tui_app_docker.sh
```

Internally:

```bash
docker run -it --rm --gpus all rt-seg:gpu
```

---

# Summary

RT-SEG provides:

* Modular segmentation engines
* Late fusion strategies
* LLM-based reasoning segmentation
* Reproducible DB-backed experiments
* GPU Docker deployment

---
