Metadata-Version: 2.3
Name: streval
Version: 0.2.0
Summary: Tool for evaluating structured outputs
Author: ruankie
Requires-Dist: pydantic>=2.12.5
Requires-Python: >=3.12
Project-URL: Homepage, https://github.com/ruankie/streval
Project-URL: Documentation, https://ruankie.github.io/streval/
Project-URL: Repository, https://github.com/ruankie/streval
Project-URL: Issues, https://github.com/ruankie/streval/issues
Description-Content-Type: text/markdown

# Streval

[![documentation](https://img.shields.io/badge/docs-mkdocs-708FCC.svg?style=flat)](https://ruankie.github.io/streval/)
[![pypi version](https://img.shields.io/pypi/v/streval.svg)](https://pypi.org/project/streval/)
[![GitHub stars](https://img.shields.io/github/stars/ruankie/streval)](https://github.com/ruankie/streval/stargazers)
[![GitHub license](https://img.shields.io/github/license/ruankie/streval)](https://github.com/ruankie/streval/blob/main/LICENSE)

**Streval** is a simple Python library for evaluating structured extraction results, such as data pulled from documents or generated by LLMs.  

It allows you to **compare predicted structured objects** (Python dicts or Pydantic models) to a set of **ground truth objects** and get a detailed **accuracy summary and breakdown**.

## Key Features

* Strict matching only (binary comparison):  
  * Cheap, fast, deterministic, and simple  
  * Currently **no fuzzy matching**; a field is either correct or incorrect  
* Provides **field-level accuracy**: average correctness across all fields and a per-field accuracy breakdown  
* Provides **object-level accuracy**: shows how many predictions are exact matches to the ground truth  
* Penalizes **missing fields** in the prediction  
* Penalizes **extra fields** not in the ground truth  
* Handles nested dictionaries and lists recursively  
* Accepts **Pydantic models or plain dicts** for both predictions and ground truth  

---

## How it Works

1. You provide a **set of ground truth objects** and a **list of corresponding predictions** (comparisons are done element-wise).  
2. Streval recursively compares each field of each prediction with the corresponding ground truth.
3. You get a structured **accuracy summary** including:  
    * Field-level accuracy (average accuracy across all fields)  
    * Object-level accuracy (percentage of exact prediction object matches)  
    * Per-field breakdown (accuracy for each field)

This makes it easy to see **exactly where your extraction method succeeds or fails**.

## Tests

Run tests with coverage:

```bash
uv run pytest --cov=src
```

## Build Docs

Documentation is built using MkDocs.

- To build the static site:

    ```bash
    uv run mkdocs build
    ```

- To serve the documentation locally:

    ```bash
    uv run mkdocs serve
    ```

- Publish docs to GitHub Pages:

    ```bash
    uv run mkdocs gh-deploy
    ```

## Publish to PyPi

1. Bump version (e.g. minor bump):

    ```bash
    uv version --bump minor
    ```

1. Build source distribution

    ```bash
    uv build
    ```

1. Publish to PyPi

    ```bash
    uv publish
    ```
