Metadata-Version: 2.4
Name: scd-analysis
Version: 1.0.0
Summary: Severe Chronic Disease (SCD) Analysis Pipeline for Danish National Registers
Author-email: Tobias Kragholm <tobias.kragholm@example.com>
License: MIT
Project-URL: Homepage, https://github.com/tkragholm/scd-analysis
Project-URL: Repository, https://github.com/tkragholm/scd-analysis
Project-URL: Documentation, https://scd-analysis.readthedocs.io/
Project-URL: Bug Tracker, https://github.com/tkragholm/scd-analysis/issues
Keywords: epidemiology,healthcare,chronic-disease,danish-registers,population-health,medical-research,data-analysis,biostatistics
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: polars>=0.20.0
Requires-Dist: pandas>=1.5.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: pathlib2>=2.3.0; python_version < "3.11"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=2.20.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=5.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Requires-Dist: myst-parser>=0.18.0; extra == "docs"
Provides-Extra: performance
Requires-Dist: pyarrow>=10.0.0; extra == "performance"
Requires-Dist: fastparquet>=0.8.0; extra == "performance"
Dynamic: license-file

# SCD Analysis - Severe Chronic Disease Analysis Pipeline

[![Python Version](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

A high-performance Python package for analyzing severe chronic diseases (SCD) using Danish national health registers. This package provides a complete pipeline for processing, analyzing, and matching complex epidemiological data with lazy evaluation and optimal memory usage.

### Basic Usage

```python
from scd_analysis import run_scd_pipeline, get_default_config

# Run with default configuration
final_data = run_scd_pipeline()

# Customize configuration
config = get_default_config()
config["age_cutoff"] = 5
config["study_period"]["end_year"] = 2020

final_data = run_scd_pipeline(config)

# Basic descriptive analysis
from scd_analysis.pipeline import run_descriptive_analysis
summary_stats = run_descriptive_analysis(final_data)
print(summary_stats)
```

### Advanced Usage

```python
from scd_analysis.data import process_lpr_data, process_mfr_data
from scd_analysis.socioeconomic import SocioeconomicProcessor

# Process specific components
config = get_default_config()

# Process hospital data
df_lpr = process_lpr_data(config)

# Process socioeconomic data with custom settings
socio_processor = SocioeconomicProcessor(config)
df_socio = socio_processor.process(df_lpr)
```

## Package Structure

- **`scd_analysis.config`**: Configuration management
- **`scd_analysis.data`**: Core data processing modules
- **`scd_analysis.socioeconomic`**: Socioeconomic data processing (SEPLINE-compliant)
- **`scd_analysis.pipeline`**: Pipeline orchestration and analysis
- **`scd_analysis.utils`**: Utility functions and helpers

## Data Requirements

This package is designed to work with Danish national health registers:

- **LPR**: Hospital discharge register
- **MFR**: Birth register
- **BEF**: Population register
- **AKM**: Employment register
- **FAIK**: Income register
- **UDDF**: Education register
- **DOD/VNDS**: Death/emigration registers

Data should be provided as Parquet files (single files or partitioned datasets).

## Performance Benefits

- **Lazy Evaluation**: Only loads necessary data into memory
- **Predicate Pushdown**: Filters applied at file level
- **Partitioned Support**: Efficient processing of time-partitioned data
- **Parallel Processing**: Automatic parallelization of operations
- **Memory Optimization**: Streaming processing for large datasets

## Key Features

### Socioeconomic Processing

- SEPLINE-compliant ethnicity categorization (A1, A2, B1, B2, C1, C2)
- Danish regional and municipal classifications
- Population density and urbanization categories
- Family structure and cohabitation status

### SCD Analysis

- Automated severe chronic disease flagging
- Age-appropriate diagnosis criteria
- Temporal analysis capabilities
- Cohort matching and controls

### Data Quality

- Comprehensive validation and quality checks
- Missing data reporting
- Data lineage tracking
- Performance monitoring
