Metadata-Version: 2.4
Name: aipw-analysis
Version: 0.1.0
Summary: Utilities for preprocessing and estimating AIPW / DoubleML analyses.
Author: Corey Gelb-Bicknell
License-Expression: MIT
Project-URL: Homepage, https://github.com/coreygb1/2604_aipw
Project-URL: Repository, https://github.com/coreygb1/2604_aipw
Project-URL: Issues, https://github.com/coreygb1/2604_aipw/issues
Keywords: econometrics,aipw,doubleml,causal-inference,census
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: polars==1.26.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: scikit-learn>=1.5.0
Requires-Dist: doubleml>=0.10.0
Requires-Dist: pandas>=2.2.0
Dynamic: license-file

# AIPW Analysis Baseline

This repo now has a safer baseline for preprocessing and estimation:

- paths are environment-driven instead of hardcoded to one machine
- program and experiment configs are validated before work starts
- preprocessing derives source requirements from `configs/programs/*.yaml`
- estimation settings now live alongside other run-level experiment specs in `configs/experiments.yaml`

## Setup

```bash
python -m venv .venv
source .venv/bin/activate
pip install -e .
```

Install from PyPI:

```bash
pip install aipw-analysis
```

## Environment variables

These are optional overrides when your data does not live under the repo's `data/` directory.

```bash
export AIPW_DATA_DIR="/path/to/data"
export AIPW_DATABANK_GLOB="/path/to/databank/paths=*.parquet"
export AIPW_SOURCE_YU_2025="/path/to/yu_2025.parquet"
export AIPW_SOURCE_WGU="/path/to/wgu.parquet"
```

## Workflow

Process one source:

```bash
python scripts/0_process_data.py --source yu_2025
python scripts/0_process_data.py --source wgu
```

Run estimation:

```bash
python scripts/1_fit_and_estimate.py
python scripts/1_fit_and_estimate.py --program yu_ged
```

Experiment runs are assembled from named specs in `configs/experiments.yaml`:

- `covariate_sets` choose which covariates enter the model
- `nuisance_models` choose the ML models for `ml_g` and `ml_m`
- `estimation_specs` choose DoubleML settings like `n_folds`, `n_rep`, and `score`
- `inference_specs` choose inference settings like `alpha`
- each `grid` block crosses those named specs into concrete runs

## Current limits

- `scripts/1_fit_and_estimate.py` currently guards against multi-period data because the exact DoubleML staggered-adoption specification still needs to be confirmed before trusting those estimates.
- The estimator requires `doubleml` and `pandas`, which are installed through `pip install -e .`.
- Tests cover config and run-spec validation, not the full econometric workflow.

## Publishing

Build locally:

```bash
python -m pip install --upgrade build
python -m build
```

This repo also includes a GitHub Actions workflow at `.github/workflows/publish.yml` for PyPI Trusted Publishing. On PyPI, add a Trusted Publisher for:

- owner: `coreygb1`
- repository: `2604_aipw`
- workflow: `publish.yml`
- environment: `pypi`

Then publish by pushing a tag like `v0.1.0`.
