Metadata-Version: 2.4
Name: vcti-fileloader
Version: 1.0.0
Summary: File loader framework with pluggable descriptors, validators, and a registry for format-specific data loading
Author: Visual Collaboration Technologies Inc.
License: Proprietary
Project-URL: Homepage, https://github.com/vcollab/vcti-python-fileloader
Project-URL: Repository, https://github.com/vcollab/vcti-python-fileloader
Project-URL: Issues, https://github.com/vcollab/vcti-python-fileloader/issues
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Typing :: Typed
Requires-Python: <3.15,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Requires-Dist: vcti-plugin-catalog>=1.0.0
Requires-Dist: vcti-array-tree>=1.0.0
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: lint
Requires-Dist: ruff; extra == "lint"
Provides-Extra: typecheck
Requires-Dist: mypy; extra == "typecheck"
Dynamic: license-file

# vcti-fileloader

A protocol-based framework for loading hierarchical scientific and engineering
data from files. It defines a **standard interface** that all file-format loaders
must implement, and a **registry** for discovering and managing them at runtime.

This package is fully typed (`py.typed`) and safe for strict type checkers
(mypy `--strict`, pyright).

## Why this package exists

Applications that work with simulation and CAE data need to load many file
formats — HDF5, VTK, OpenFOAM, proprietary binary, etc. Each format has its
own library, its own API, and its own way of representing a tree of nodes,
metadata, and heavy data arrays.

vcti-fileloader solves this by defining a **single, uniform protocol** that
every loader plugin implements. Application code programs against the protocol,
not the format. Adding support for a new file format means writing a new loader
plugin — no changes to application code.

```
┌─────────────────────────────────────────────────────┐
│                  Application Code                   │
│         (uses Loader protocol + Registry)           │
└──────────────┬──────────────────────┬───────────────┘
               │                      │
       ┌───────▼───────┐      ┌───────▼───────┐
       │  HDF5 Loader  │      │  VTK Loader   │  ... (one per format)
       │  (plugin pkg) │      │  (plugin pkg) │
       └───────────────┘      └───────────────┘
```

## Key Concepts

### Loader (Protocol)

The central interface. Any class that implements these methods satisfies
the protocol — no base class inheritance required (PEP 544 structural subtyping):

| Method | Purpose |
|--------|---------|
| `load(path, **options)` | Open a file and return an opaque data handle |
| `unload(data)` | Release file handles and memory (**idempotent**) |
| `can_load(path)` | Lightweight check — can this loader handle the file? |
| `load_tree(data)` | Extract the node hierarchy as a NumPy structured array |
| `load_node_info(data)` | Extract lightweight node metadata (name, type) |
| `load_attributes(data, node_ids)` | Extract key-value attributes per node |
| `load_dataset(data, node_id)` | Extract a heavy data array as a `DataNode` |

Each loader also carries optional **validator** and **setup** hooks:

- `LoaderValidator.validate()` — returns `True` if all runtime dependencies
  (e.g., h5py, vtk) are available.
- `LoaderSetup.setup()` — configures paths, environment variables, or
  component versions before first use.

### LoaderDescriptor

Wraps a `Loader` instance with registry metadata — a unique `id`, a
human-readable `name`, and filterable `attributes` (e.g., `supported_formats`).

### LoaderRegistry

A typed registry of `LoaderDescriptor` entries. Register loaders at startup,
then look them up by id or query by attributes at runtime.

**Key behaviours** (inherited from `vcti-plugin-catalog`):

- `register()` raises `DuplicateEntryError` if the id already exists.
- `get()` raises `EntryNotFoundError` if the id is not found.
- `find()` returns `None` instead of raising for missing ids.
- `lookup` property provides attribute-based filtering via `vcti-lookup`.

### NodeID

Type alias (`int`) for node identifiers used across the protocol, exported
from the package for use in type annotations.

### DataNode

`load_dataset()` returns a `DataNode` from `vcti-array-tree`. A DataNode has:

- `.data` — the NumPy array containing the heavy data.
- `.attributes` — a `dict[str, Any]` of metadata for the node.

See the [vcti-array-tree documentation](https://pypi.org/project/vcti-array-tree/)
for full details.

## Data Flow

```
Register ──► Discover ──► Validate & Setup ──► Load ──► Query ──► Unload
```

1. **Register** — Each loader plugin registers a `LoaderDescriptor` with
   the shared `LoaderRegistry`.
2. **Discover** — Application code looks up a loader by id or filters by
   attributes (e.g., find all loaders that support `"hdf5-file"`).
3. **Validate & Setup** — Call `validator.validate()` and `setup.setup()`
   to ensure the runtime environment is ready.
4. **Load** — `loader.load(path)` opens the file and returns an opaque handle.
5. **Query** — Use `load_tree`, `load_node_info`, `load_attributes`, and
   `load_dataset` to extract structure and data from the handle.
6. **Unload** — `loader.unload(handle)` releases resources.  This is
   **idempotent** — calling it twice on the same handle must not raise.

## Lifecycle Contracts

- Call `can_load(path)` **before** `load()` to prevent `UnsupportedFormatError`.
- Call `validator.validate()` and `setup.setup()` **before** the first `load()`.
- `unload()` is **idempotent** — safe to call multiple times on the same handle.
- After `unload()`, calling any `load_*` method on that handle is **undefined**.
- `load()` may be called multiple times with different paths; each returns
  an independent handle.

## Installation

```bash
pip install vcti-fileloader>=1.0.0
```

### In `pyproject.toml` dependencies

```toml
dependencies = [
    "vcti-fileloader>=1.0.0",
]
```

## Quick Start

```python
from pathlib import Path

from vcti.fileloader import LoaderDescriptor, LoaderRegistry

# At startup: register available loaders
registry = LoaderRegistry()
registry.register(LoaderDescriptor(
    id="hdf5-h5py-loader",
    name="HDF5 Loader (h5py)",
    loader=my_h5py_loader,                              # implements Loader protocol
    attributes={"supported_formats": ["hdf5-file"]},
))

# At runtime: discover, validate, load
desc = registry.get("hdf5-h5py-loader")
desc.loader.validator.validate()                         # check dependencies
desc.loader.setup.setup()                                # configure environment

handle = desc.loader.load(Path("simulation.h5"))
tree   = desc.loader.load_tree(handle)                   # node hierarchy
info   = desc.loader.load_node_info(handle)              # node names and types
attrs  = desc.loader.load_attributes(handle)             # per-node attributes
node   = desc.loader.load_dataset(handle, node_id=1)     # heavy data array
desc.loader.unload(handle)                               # release resources
```

## Error Handling

All exceptions inherit from `LoaderError`, so callers can catch broadly
or handle specific failure modes:

| Exception | When to raise / catch |
|-----------|----------------------|
| `LoaderError` | Base — catches any loader failure |
| `LoadError` | File cannot be opened or parsed (I/O errors, corrupt files) |
| `UnloadError` | Resource cleanup failed |
| `UnsupportedFormatError` | Loader does not recognise the file format. Prefer `can_load()` first |
| `ValidationError` | `validator.validate()` detected missing dependencies |
| `SetupError` | `setup.setup()` could not configure the environment |

**Distinguishing `LoadError` vs `UnsupportedFormatError`:** Use
`UnsupportedFormatError` when the loader does not recognise the format at
all (wrong extension, unknown magic bytes). Use `LoadError` when the
format is recognised but the content cannot be read (truncated file,
incompatible version, permission error).

### Error handling example

```python
from vcti.fileloader import (
    LoaderError,
    LoadError,
    UnsupportedFormatError,
    ValidationError,
    SetupError,
)

# Validate before loading
if not desc.loader.validator.validate():
    raise ValidationError("Missing h5py — install with: pip install h5py")

if not desc.loader.setup.setup():
    raise SetupError("Could not configure HDF5 library paths")

# Load with error handling
path = Path("data.h5")
if not desc.loader.can_load(path):
    print(f"Loader {desc.id} cannot handle {path}")
else:
    try:
        handle = desc.loader.load(path)
        tree = desc.loader.load_tree(handle)
    except LoadError as e:
        print(f"Failed to read file: {e}")
    except LoaderError as e:
        print(f"Unexpected loader error: {e}")
    finally:
        desc.loader.unload(handle)  # safe even if load() failed partially
```

## What this package does NOT do

- **No concrete loaders** — This is the interface only. Actual file reading
  (HDF5, VTK, etc.) lives in separate loader plugin packages.
- **No data transformation** — Data is returned as-is from the loader.
- **No caching** — Caching strategies belong at the application level.

## Further Reading

- [Common Patterns](docs/patterns.md) — Validator/setup implementation,
  multi-loader registration, error handling, and naming conventions.
- [Design & Concepts](docs/design.md) — Architecture, protocol rationale,
  and package boundaries.

## Dependencies

- [numpy](https://numpy.org/) (>=1.24)
- [vcti-plugin-catalog](https://pypi.org/project/vcti-plugin-catalog/) (>=1.0.0) — Descriptor and Registry base classes
- [vcti-array-tree](https://pypi.org/project/vcti-array-tree/) (>=1.0.0) — `DataNode` returned by `load_dataset`

## Versioning

This package follows [Semantic Versioning](https://semver.org/).  Breaking
changes to the `Loader` protocol (adding required methods, changing
signatures) will only occur in major version bumps.  Downstream loader
plugins should pin to a compatible major version (e.g., `vcti-fileloader>=1.0,<2`).
