Metadata-Version: 2.4
Name: tha-str-runner
Version: 0.1.0
Summary: A small Python library that normalizes and slugifies strings — works on single values or CSV-style row dicts.
License: MIT
License-File: LICENSE
Keywords: csv,normalize,rows,slugify,string
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Typing :: Typed
Requires-Python: >=3.10
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

# tha-str-runner

[![CI](https://github.com/tha-guy-nate/tha-str-runner/actions/workflows/ci.yml/badge.svg)](https://github.com/tha-guy-nate/tha-str-runner/actions/workflows/ci.yml)

A small Python library that normalizes and slugifies strings — works on single values or CSV-style row dicts.

## Install

```bash
pip install tha-str-runner
```

## Quick start

```python
from tha_str_runner import ThaStr

formatter = ThaStr()

# Single value — normalize
ThaStr.format_str("  Hello World  ", case="lower")          # "hello world"
ThaStr.format_str("foo & bar", replace={"&": "and"})        # "foo and bar"
ThaStr.format_str("foo123", replace={r"\d+": "#"}, regex=True)  # "foo#"

# Single value — slugify
ThaStr.slugify("Hello World")                               # "hello-world"
ThaStr.slugify("café au lait")                              # "cafe-au-lait"
ThaStr.slugify("School 001", prefix="org-")                 # "org-school-001"

# Row dicts — normalize
rows = [
    {"Org BK": "school-001", "Name": "  Lincoln Elementary  "},
    {"Org BK": "school-002", "Name": "ROOSEVELT MIDDLE"},
]

result = formatter.format_str_rows(rows, column="Name", case="title")
# [{"Org BK": "school-001", "Name": "Lincoln Elementary"}, ...]

# Row dicts — slugify (combine columns into a key)
result = formatter.slugify_rows(rows, columns=["Org BK", "Name"], out_column="Slug")
# [{"Org BK": "school-001", ..., "Slug": "school-001-lincoln-elementary"}, ...]
```

## API

### `ThaStr`

```python
ThaStr()
```

### `ThaStr.format_str()`

```python
ThaStr.format_str(
    value: str,
    *,
    strip: bool = True,           # strip leading/trailing whitespace
    case: str | None = None,      # "upper" | "lower" | "title" | None
    replace: dict[str, str] | None = None,  # {old: new} substitutions
    regex: bool = False,          # treat replace keys as regex patterns
) -> str
```

Also callable as an instance method. Raises `StrError` for invalid `case`.

### `formatter.format_str_rows()`

```python
formatter.format_str_rows(
    rows,                              # list of row dicts
    column,                            # column containing strings
    *,
    strip=True,
    case=None,
    replace=None,
    regex=False,
    out_column=None,                   # write to a new column instead of overwriting
    on_error="error",                  # "error" | "skip" | "blank"
    skip_statuses=["error", "warning"],
) -> list[dict]
```

Results are also stored in `formatter.rows`.

#### `on_error`

| Value | Behaviour |
|---|---|
| `"error"` | `row status="error"`, `message=...`, output column set to `""` |
| `"skip"` | Row returned unchanged |
| `"blank"` | Output column set to `""`, row status untouched |

### `ThaStr.slugify()`

```python
ThaStr.slugify(
    value: str,
    *,
    sep: str = "-",       # separator between slug segments
    prefix: str = "",     # prepended to the slug
    suffix: str = "",     # appended to the slug
) -> str
```

Normalizes unicode to ASCII, lowercases, replaces non-alphanumeric runs with `sep`, and strips leading/trailing separators.

```python
ThaStr.slugify("café au lait")              # "cafe-au-lait"
ThaStr.slugify("Hello World", sep="_")      # "hello_world"
ThaStr.slugify("abc", prefix="id-")        # "id-abc"
```

### `formatter.slugify_rows()`

```python
formatter.slugify_rows(
    rows,                              # list of row dicts
    columns,                           # column name, or list of column names to combine
    out_column,                        # output column (always a new/derived column)
    *,
    sep="-",
    prefix="",
    suffix="",
    on_error="error",                  # "error" | "skip" | "blank"
    skip_statuses=["error", "warning"],
) -> list[dict]
```

When `columns` is a list, values are joined with `sep` before slugifying.

Results are also stored in `formatter.rows`.

### Composing with `tha-csv-runner`

```python
from tha_csv_runner import ThaCSV
from tha_str_runner import ThaStr

runner = ThaCSV()
runner.read("Step 1 of 3", "input.csv", ["Org BK", "Name"])

formatter = ThaStr()
enriched = formatter.format_str_rows(
    rows=runner.rows,
    column="Name",
    case="title",
    strip=True,
)
enriched = formatter.slugify_rows(
    rows=enriched,
    columns=["Org BK", "Name"],
    out_column="Slug",
    prefix="org-",
)

runner.write("Step 3 of 3", "output.csv", rows=enriched)
```

## Alternatives

This library is intentionally limited in scope — it normalizes a fixed set of common string operations and slugifies values, with row-level integration for the `tha-*` ecosystem. For more flexible string handling:

- [**python-slugify**](https://github.com/un33k/python-slugify) — robust slugification with full unicode transliteration support and configurable stopword removal
- [**stringcase**](https://github.com/okunishinishi/python-stringcase) — case conversion (camel, snake, pascal, etc.) beyond upper/lower/title
- [**ftfy**](https://ftfy.readthedocs.io) — fixes broken unicode text (mojibake, garbled encodings) before normalization
- [**regex**](https://pypi.org/project/regex/) (stdlib drop-in) — more powerful regex engine when `re` isn't enough for complex replace patterns

Choose this library when you want strip/case/replace normalization AND slugification AND per-row error capture that slots into the `tha-*` pipeline — no other single package gives you all three with the `row status` pattern.

## License

MIT
