Metadata-Version: 2.4
Name: tha-num-runner
Version: 0.1.0
Summary: A small Python library that cleans and parses numeric strings — strips currency symbols, commas, and casts to int or float, on single values or CSV-style row dicts.
License: MIT
License-File: LICENSE
Keywords: csv,currency,number,numeric,parse,rows
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Typing :: Typed
Requires-Python: >=3.10
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

# tha-num-runner

[![CI](https://github.com/tha-guy-nate/tha-num-runner/actions/workflows/ci.yml/badge.svg)](https://github.com/tha-guy-nate/tha-num-runner/actions/workflows/ci.yml)

A small Python library that cleans and parses numeric strings — strips currency symbols, commas, and casts to int or float, on single values or CSV-style row dicts.

## Install

```bash
pip install tha-num-runner
```

## Quick start

```python
from tha_num_runner import ThaNum

formatter = ThaNum()

# Single value
ThaNum.format_num("$1,234.56")                        # 1234.56
ThaNum.format_num("£2,000.00", cast="int")            # 2000
ThaNum.format_num("(500.75)", round_to=1)             # -500.8
ThaNum.format_num("€9.99", cast="int")                # 9

# Row dicts
rows = [
    {"Org BK": "school-001", "Budget": "$1,200.00"},
    {"Org BK": "school-002", "Budget": "£800.50"},
]

result = formatter.format_num_rows(rows, column="Budget", cast="float", round_to=2)
# [{"Org BK": "school-001", "Budget": 1200.0}, ...]
```

## Cleaned automatically

| Input | Output |
|---|---|
| `"$1,234.56"` | `1234.56` |
| `"£2,000"` | `2000.0` |
| `"€9.99"` | `9.99` |
| `"(500)"` | `-500.0` |
| `"($1,200.00)"` | `-1200.0` |
| `"  42  "` | `42.0` |

Supported currency symbols: `$`, `€`, `£`, `¥`, `₹`, `₩`, `₽`, `₺`, `₫`, `฿`, `₱`, `₴`

## API

### `ThaNum`

```python
ThaNum()
```

### `ThaNum.format_num()`

```python
ThaNum.format_num(
    value: str | int | float,
    *,
    strip_currency: bool = True,   # remove currency symbols
    strip_commas: bool = True,     # remove comma thousand separators
    round_to: int | None = None,   # decimal places to round to
    cast: str = "float",           # "float" | "int"
) -> float | int
```

Also callable as an instance method. Raises `NumError` on unparseable input or invalid `cast`.

Parenthetical negatives (`(100)`) are converted to negative numbers automatically.

### `formatter.format_num_rows()`

```python
formatter.format_num_rows(
    rows,                              # list of row dicts
    column,                            # column containing numeric strings
    *,
    strip_currency=True,
    strip_commas=True,
    round_to=None,
    cast="float",
    out_column=None,                   # write to a new column instead of overwriting
    on_error="error",                  # "error" | "skip" | "blank"
    skip_statuses=["error", "warning"],
) -> list[dict]
```

Results are also stored in `formatter.rows`.

#### `on_error`

| Value | Behaviour |
|---|---|
| `"error"` | `row status="error"`, `message=...`, output column set to `""` |
| `"skip"` | Row returned unchanged |
| `"blank"` | Output column set to `""`, row status untouched |

### Composing with `tha-csv-runner`

```python
from tha_csv_runner import ThaCSV
from tha_num_runner import ThaNum

runner = ThaCSV()
runner.read("Step 1 of 2", "input.csv", ["Org BK", "Budget"])

formatter = ThaNum()
enriched = formatter.format_num_rows(
    rows=runner.rows,
    column="Budget",
    cast="float",
    round_to=2,
)

runner.write("Step 2 of 2", "output.csv", rows=enriched)
```

## Alternatives

This library is intentionally limited in scope — it handles one specific pattern: cleaning messy numeric strings from CSV exports and casting them to Python numbers, with row-level error capture for the `tha-*` ecosystem. For more general needs:

- [**babel**](https://babel.pocoo.org) — locale-aware number parsing (`babel.numbers.parse_number`) that handles locale-specific decimal and grouping separators
- [**price-parser**](https://github.com/scrapinghub/price-parser) — extracts prices and currency from arbitrary text, useful when the format is completely unknown
- [**pandas**](https://pandas.pydata.org) — `pd.to_numeric()` with `errors="coerce"` for vectorized numeric coercion on DataFrames

Choose this library when you want currency stripping, comma cleaning, and parenthetical negative handling AND per-row error capture that slots into the `tha-*` pipeline — no other single package gives you all of that with the `row status` pattern.

## License

MIT
