Metadata-Version: 2.4
Name: taintly
Version: 0.0.0
Summary: Zero-dependency CI/CD pipeline security auditor for GitHub Actions and GitLab CI
Author: Asaf Yashayev
License-Expression: MIT
Project-URL: Homepage, https://github.com/Nellur35/taintly
Project-URL: Repository, https://github.com/Nellur35/taintly
Project-URL: Issues, https://github.com/Nellur35/taintly/issues
Keywords: security,cicd,github-actions,gitlab,supply-chain,audit,owasp
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-timeout>=2.3; extra == "dev"
Requires-Dist: coverage>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=5.0; extra == "dev"
Requires-Dist: hypothesis>=6.100; extra == "dev"
Requires-Dist: mypy>=1.14; extra == "dev"
Dynamic: license-file

# taintly

[![PyPI](https://img.shields.io/pypi/v/taintly.svg)](https://pypi.org/project/taintly/)
[![Downloads](https://img.shields.io/pypi/dm/taintly.svg)](https://pypi.org/project/taintly/)
[![Python](https://img.shields.io/pypi/pyversions/taintly.svg)](https://pypi.org/project/taintly/)
[![License](https://img.shields.io/pypi/l/taintly.svg)](https://github.com/Nellur35/taintly/blob/main/LICENSE)

taintly is a security scanner for CI/CD pipelines. It reads GitHub Actions, GitLab CI, and Jenkins configuration and reports misconfigurations mapped to the [OWASP CI/CD Top 10](https://owasp.org/www-project-top-10-ci-cd-security-risks/). Detection patterns are borrowed from [zizmor](https://github.com/woodruffw/zizmor), [poutine](https://github.com/boostsecurityio/poutine), and [actionlint](https://github.com/rhysd/actionlint).

Pure Python 3.10+. Zero runtime dependencies. No telemetry.

## Network behaviour

Local scans make no network calls. `--fix` calls `git ls-remote` to resolve action tags to commit SHAs. `--platform-audit`, `--github-org`, `--gitlab-group`, and `--transitive` call the GitHub or GitLab API and need a token.

## Quickstart

```bash
pip install taintly
taintly /path/to/your/repo
```

Or source-install for development:

```bash
git clone https://github.com/Nellur35/taintly
cd taintly
pip install -e ".[dev]"
python -m taintly /path/to/your/repo
```

### CI integration in one line

**GitHub Actions:**

```yaml
- uses: Nellur35/taintly@v1
  with:
    fail-on: HIGH
```

**GitLab CI (16.11+):**

```yaml
include:
  - component: $CI_SERVER_FQDN/nellur35/taintly/taintly@v1
    inputs:
      fail-on: HIGH
```

See [`docs/INTEGRATION.md`](docs/INTEGRATION.md) for nine more patterns (SARIF upload, baseline+diff, scheduled scans, platform posture audit, pre-commit, auto-fix PR, org-wide scan).

### Remote-scan modes

```bash
GITHUB_TOKEN=ghp_... taintly --github-org my-org
GITLAB_TOKEN=glpat-... taintly --gitlab-group my-group
taintly /path/to/repo --platform jenkins   # required; Jenkins has no single discovery file
```

## What it detects

<!-- AUTOGEN:summary -->
188 file-based rules and 21 platform-posture checks. The file-based rules break down as 86 GitHub, 54 GitLab, and 48 Jenkins, and include a dedicated AI / ML category (43 rules across the three platforms, plus multi-platform taint analysis) for workflows that run model loads or AI coding agents. The posture checks (12 GitHub, 9 GitLab) only run in `--platform-audit` mode.
<!-- /AUTOGEN:summary -->

A few categories worth calling out:

AI / ML risk. 36 rules plus a taint flow targeting model deserialisation (`torch.load` without `weights_only=True`, `keras.load_model` without `safe_mode=True`, `joblib.load`, `dill`, `cloudpickle`, `np.load(allow_pickle=True)`), HuggingFace supply chain (`trust_remote_code=True`, unpinned `revision=`), AI coding-agent exposure (`uses: anthropics/claude-code-action@...` / agentic CLIs on fork-triggerable events, dangerous flags like `--dangerously-skip-permissions`, fork-identity-guard-aware exploitability), MCP server hygiene (unpinned `npx`-loaded servers, privileged scopes like `server-filesystem` / `server-bash` on fork events), LLM output reaching a shell interpreter or `$GITHUB_ENV` / `$GITHUB_OUTPUT`, and — via `TAINT-GH-005` — AI agent step outputs as a first-class taint source with provenance chains (`agent:anthropics/claude-code-action -> steps.review.outputs.summary -> run: echo "${{ ... }}"`). See the `ai_ml_model_risk` cluster in reports for the AI-specific remediation narrative.

Living Off The Pipeline detection. Eight rules flag build tools like npm, pip, make, cargo, and gradle when they run in jobs that check out pull-request code. This is the pattern used in the Ultralytics attack (December 2024).

Multi-stage taint analysis (`TAINT-GH-001` through `TAINT-GH-005`, plus `TAINT-GL-001` through `TAINT-GL-003` for GitLab). Follows attacker-controlled GitHub or GitLab contexts through every common laundering pattern before they reach a shell `run:` / `script:` block:

| Rule | Flow |
|------|------|
| `TAINT-GH-001` | Shallow: `${{ tainted }}` → `env: VAR` → `run: $VAR`. Handles compound expressions like `${{ github.head_ref \|\| github.ref }}` (the dominant idiom in real workflows). |
| `TAINT-GH-002` | Multi-hop env: `A: ${{ tainted }}`, `B: ${{ env.A }}`, ..., `run: $LAST`. Fixed-point resolver, any chain depth, declaration-order independent. |
| `TAINT-GH-003` | `$GITHUB_ENV` dynamic write: an earlier step does `echo "NAME=..." >> $GITHUB_ENV` and a later step's `run:` references `$NAME`. Order-sensitive — only later steps see the write. |
| `TAINT-GH-004` | Step output chain: a step with `id:` writes `>> $GITHUB_OUTPUT` (or legacy `::set-output`) and a downstream step references `${{ steps.<id>.outputs.<name> }}` inside a shell `run:`. |

Each finding's snippet shows the full provenance chain — for example `taint: github.event.comment.body -> env.RAW -> env.MID -> $GITHUB_ENV.COMMENT -> echo "$COMMENT"` — so reviewers see exactly how the taint propagated. Out of scope (documented in the module): cross-job `needs.<id>.outputs.<name>`, `workflow_call` / `workflow_run` flows, artefact propagation, shell-quoting analysis.

Platform posture scanning (`--platform-audit`). Calls the GitHub or GitLab API to check repository and organization settings: branch protection, the default `GITHUB_TOKEN` permission, fork-PR approval, CODEOWNERS coverage, and GitLab variable Protected/Masked flags. Reads both classic branch protection and GitHub repository rulesets.

Transitive action analysis (`--transitive`). For each SHA-pinned action, fetches the action's `action.yml` and flags sub-actions referenced by mutable tags. Catches the "outer pin doesn't protect inner pins" gap in composite actions.

Every rule carries STRIDE categories, a two-sentence attack narrative, and references to real-world incidents (Trivy March 2026, Ultralytics December 2024, tj-actions March 2025, and so on).

## How findings are reported

taintly reports distinct risks, not rule-by-rule hits.

Findings that describe the same underlying weakness collapse into one root-cause cluster called a finding family. A workflow that pins a reusable workflow to `@main` triggers three correlated rules; the report shows one cluster called "Mutable dependency references" with the three rule IDs underneath, not three independent findings.

Each finding carries three signals:

| Signal | What it means |
|---|---|
| Severity | How bad the policy violation is on paper. Doesn't change across workflows. |
| Confidence | How sure the detector is. Exact syntactic matches are `high`. Heuristics like shallow taint or pattern-based secret detection are `medium` or `low`. |
| Exploitability | How much leverage the workflow's context gives an attacker. A finding in a release workflow with write permissions is `high`. The same finding in a sandboxed cron job is `low`. |

Patterns whose safety depends on design intent are routed into a review-needed bucket instead of being shown as confirmed issues. A bare `pull_request_target` with `permissions: {}` and no checkout is in this bucket. So is a placeholder password used to initialise a local keychain.

The score (`--score`) deducts per cluster, weighted by `confidence × exploitability`. Review-needed clusters don't deduct. Below the score, a security debt profile labels each finding family Strong, Moderate, Weak, or Needs review:

```
Score: 88/100 (B)
Distinct risks: 3 confirmed cluster(s)

Security debt profile:
  Mutable dependency references          — Weak  (1 finding(s), expl:high)
  Privileged PR-trigger exposure         — Strong
  Credential persistence                 — Moderate  (2 finding(s), expl:medium)
  Identity / Access                      — Strong
  Release / artifact integrity           — Strong
  ...
```

## Coverage

<!-- AUTOGEN:coverage -->
| Category | Description | GitHub | GitLab | Jenkins |
|----------|-------------|--------|--------|---------|
| SEC-1 | Insufficient Flow Control | 1 | 2 | 2 |
| SEC-2 | Inadequate IAM | 3 | 3 | 3 |
| SEC-3 | Dependency Chain Abuse | 6 | 5 | 5 |
| SEC-4 | Poisoned Pipeline Execution | 23 | 9 | 8 |
| SEC-5 | Insufficient PBAC | 2 | 1 | 1 |
| SEC-6 | Insufficient Credential Hygiene | 8 | 9 | 8 |
| SEC-7 | Insecure System Configuration | 4 | 1 | 3 |
| SEC-8 | Ungoverned 3rd Party Services | 4 | 3 | 4 |
| SEC-9 | Improper Artifact Integrity | 5 | 3 | 3 |
| SEC-10 | Insufficient Logging | 4 | 2 | 1 |
| AI / ML | AI model, agent, MCP | 21 | 13 | 9 |
| TAINT | Multi-stage taint flows | 5 | 3 | 1 |

Plus 21 platform-posture rules (`PLAT-GH-001/002/005/007/008/009/010/011/012/013/014/016`, `PLAT-GL-001/002/003/004/008/009/010/011/012`).
<!-- /AUTOGEN:coverage -->

A few of the highest-signal rules:

| Rule ID | Sev | What it catches | Linked incident |
|---------|-----|-----------------|-----------------|
| LOTP-GH-001 | CRIT | Build tool running in a job that checks out PR head code | Ultralytics Dec 2024 |
| TAINT-GH-001 | CRIT | Shallow env-mediated injection: `${{ tainted }}` → env → `run:` | Ultralytics Dec 2024 |
| TAINT-GH-002 | CRIT | Multi-hop env propagation through `${{ env.X }}` indirections | |
| TAINT-GH-003 | CRIT | `$GITHUB_ENV` dynamic-write bridge between steps | |
| TAINT-GH-004 | CRIT | `$GITHUB_OUTPUT` / `::set-output` step output chain | |
| SEC4-GH-001 | CRIT | `pull_request_target` with checkout of an untrusted ref | Trivy Mar 2026 |
| SEC4-GH-006 | CRIT | Attacker-controlled context written to `$GITHUB_ENV` | Ultralytics Dec 2024 |
| SEC4-GH-011 | CRIT | LOTP tools running inside `pull_request_target` | Ultralytics Dec 2024 |
| SEC3-GH-002 | CRIT | Action pinned to a live branch (`@main`) | tj-actions Mar 2025 |
| SEC3-GH-003 | CRIT | Reference to a known compromised action | Trivy Mar 2026 |
| SEC3-GH-001 | HIGH | Action pinned to a tag rather than a SHA | tj-actions Mar 2025 |
| SEC2-GH-001 | HIGH | `permissions: write-all` at workflow level | |
| SEC4-GH-012 | HIGH | `secrets: inherit` forwarding all caller secrets | |
| SEC6-GH-006 | HIGH | Base64-decoded payload piped to a shell | Ultralytics, Trivy |
| SEC7-GH-003 | HIGH | Self-hosted runner used in `pull_request` context | |
| SEC4-JK-003 | CRIT | Dynamic `evaluate()` / `Eval.me()` in a Jenkins pipeline | |
| AI-GH-001   | CRIT | HuggingFace `trust_remote_code=True` executes code from the model repo | |
| AI-GH-003   | HIGH | `torch.load(...)` without `weights_only=True` — pickle RCE on model load | CVE-2022-45907 |
| AI-GH-005   | HIGH | LLM SDK / agent action + attacker-controlled PR / issue context | |
| AI-GH-006   | HIGH | AI coding-agent action (`claude-code-action`, `aider`, …) on a fork-triggerable event | |
| AI-GH-007   | HIGH | LLM output piped into a shell interpreter or `$GITHUB_ENV` | |
| AI-GH-009   | CRIT | Dangerous agent flags (`--dangerously-skip-permissions`, `--yolo`, `allowedTools: "*"`) on fork event | |
| AI-GH-012   | HIGH | Privileged-scope MCP server (`server-filesystem`, `server-github`, `server-bash`) on fork event | |
| AI-GH-014   | HIGH | AI agent step output reaches shell / `$GITHUB_ENV` via `steps.<id>.outputs.*` | |
| TAINT-GH-005 | HIGH | Deep taint: agent step output → `run:` (full provenance chain) | |
| PLAT-GH-001 | CRIT | Default branch unprotected (rulesets-aware) | |
| PLAT-GH-007 | HIGH | Default `GITHUB_TOKEN` permission is read/write | |
| PLAT-GL-003 | HIGH | GitLab CI/CD variable not flagged Protected | |

STRIDE: **E** elevation of privilege, **T** tampering, **I** information disclosure, **S** spoofing, **R** repudiation, **D** denial of service.

The full catalog is in [docs/RULES.md](docs/RULES.md).

## Usage

```
python -m taintly [path] [options]
```

The non-obvious flags:

| Flag | What it does |
|---|---|
| `--score` | Print the score, distinct-risk count, and debt profile |
| `--format {text,json,csv,sarif,html}` | Pick the output format. SARIF 2.1.0 carries the v2 metadata in `result.properties` |
| `--fix` / `--fix-dry-run` | Apply or preview safe auto-fixes (SHA pinning, `persist-credentials`, `permissions:` blocks) |
| `--fix-npm-ignore-scripts` | Opt-in. Adds `--ignore-scripts` to every npm/yarn/pnpm install. Changes build semantics, so it isn't part of the default `--fix` set. |
| `--guide [RULE_ID]` | Step-by-step remediation guide |
| `--platform-audit` | API-based posture check. Use with `--github-repo` or `--gitlab-project`. |
| `--baseline [FILE]` / `--diff [FILE]` | Save current findings as a baseline, then later report only new ones |
| `--transitive` | Walk into composite actions and check their sub-actions |
| `--token-stdin` | Read the API token from stdin so it never appears in `ps` or shell history |

Run `python -m taintly --help` for the full list.

### Examples

```bash
# Local scan, HIGH and above, with the score
python -m taintly . --min-severity HIGH --score

# SARIF for GitHub Advanced Security
python -m taintly . --format sarif > results.sarif

# Self-contained HTML report
python -m taintly . --format html > report.html

# Apply safe auto-fixes
python -m taintly . --fix

# Platform-posture audit of one repo, token piped from a secrets manager
vault kv get -field=token secret/github | \
  python -m taintly --platform-audit --github-repo my-org/my-repo --token-stdin

# Run the precision corpus as a regression gate
python -m benchmark corpus
```

### Configuration

Drop a `.taintly.yml` at the repository root for per-project defaults:

```yaml
version: 1

# Only report HIGH and above
min-severity: HIGH

# Exit 1 when any CRITICAL finding fires (CI gate)
fail-on: CRITICAL

# Exclude rules entirely
exclude-rules:
  - SEC2-GH-001

# Suppress findings, with optional justification, owner, and expiry
ignore:
  - SEC4-GH-002                  # bare rule ID — suppresses everywhere
  - id: SEC3-GH-001              # rule scoped to a path
    path: legacy/
  - id: SEC4-GH-002              # justified, time-limited exception
    path: .github/workflows/internal.yml
    reason: internal workflow with no fork triggers
    owner: platform-security@example.com
    expires: 2026-09-01          # warning printed after this date
```

CLI flags override config values. Suppressions whose `expires` date has passed still apply, but a warning is printed on every scan so they don't become permanent silent exceptions.

### Suppressions

Two suppression mechanisms. Inline comments are for one-off cases; the config file is for anything that needs tracking.

```yaml
# Suppress all rules on this line
- uses: actions/checkout@v4  # taintly: ignore

# Suppress one or more specific rules
- uses: actions/checkout@v4  # taintly: ignore[SEC4-GH-005]
- uses: actions/checkout@v4  # taintly: ignore[SEC3-GH-001,SEC4-GH-005]
```

### Baseline mode

Save current findings, then report only new ones:

```bash
python -m taintly . --baseline       # writes .taintly-baseline.json
python -m taintly . --diff           # report only findings not in the baseline
```

## SARIF with GitHub Advanced Security

```yaml
- name: Run taintly
  run: python -m taintly . --format sarif > taintly.sarif

- name: Upload results
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: taintly.sarif
```

Each SARIF result carries a `properties` block with `finding_family`, `confidence`, `exploitability`, and `review_needed`. GitHub Advanced Security and the GitLab security dashboard both preserve these so you can filter the dashboard by cluster or context, not just severity.

## Architecture

```
taintly/
├── __main__.py             CLI entry point
├── models.py               Rule, Finding, pattern types
├── engine.py               Scan engine
├── families.py             Finding-family definitions and clustering
├── workflow_context.py     Per-file signal extraction for exploitability
├── scorer.py               Cluster-based score and debt profile
├── fixes.py                Auto-fix implementations
├── taint.py                Multi-stage taint analyzer (env, $GITHUB_ENV, $GITHUB_OUTPUT chains)
├── transitive.py           Composite-action sub-call analysis
├── yaml_path.py            Structural YAML path extractor
├── platform/               API-based posture scanning
├── reporters/              text, json, csv, sarif, html, score_text
└── rules/                  github, gitlab, jenkins rule packs
benchmark/
├── corpus/                 Labeled precision regression scenarios
└── corpus_runner.py        Runner for the corpus
```

Rules are pure data (`@dataclass Rule`). Adding a rule means adding an entry to a rules file — no engine changes. Five pattern types cover the detection surface: `RegexPattern` (line-level), `ContextPattern` (co-occurrence), `SequencePattern` (ordered absence), `BlockPattern` (indent-scoped), and `PathPattern` (structural YAML path queries).

## Testing

```
<!-- AUTOGEN:testing -->
File-based rules:  188 (86 GitHub, 54 GitLab, 48 Jenkins; 43 AI/ML + 9 taint)
Platform rules:    21 (12 GitHub, 9 GitLab)
<!-- /AUTOGEN:testing -->
Pytest tests:      787 passed, 15 skipped
Self-test samples: 479 positive, 530 negative across 171 rules
Mutation kills:    100.0% (6300/6300 across 7 operators; known gaps documented in self_test.py)
Corpus scenarios:  5 labeled scenarios, all passing
Real-world scan:   26 AI findings across 7 of 22 public AI/ML repos
```

The `tests/` directory has four layers:

| Layer | Purpose |
|---|---|
| `tests/unit/` | Models, engine, reporters, families, scorer |
| `tests/integration/` | Golden-file contract: hardened workflows must produce zero findings |
| `tests/evasion/` | Documented semantic bypasses, kept visible so gaps are tracked |
| `tests/fuzz/` | 43 adversarial inputs (YAML bombs, null bytes, RTL unicode, ReDoS payloads) |

The precision corpus under `benchmark/corpus/` is a separate gate. Each scenario is a labeled workflow with expected score range, required and forbidden families, and expected exploitability tier. Run it with `python -m benchmark corpus`.

```bash
pip install pytest pytest-cov
pytest tests/ --cov=taintly --cov-fail-under=60
```

## Requirements

Python 3.10+. No third-party dependencies.

## Project resources

| Document | What's in it |
|---|---|
| [`docs/INTEGRATION.md`](docs/INTEGRATION.md) | Nine copy-pasteable CI patterns (GitHub Actions, GitLab CI, Jenkins, pre-commit) |
| [`docs/RULES.md`](docs/RULES.md) | Rule-ID reference — what each rule detects, mapped to OWASP CI/CD Top 10 |
| [`docs/ROADMAP.md`](docs/ROADMAP.md) | 12-week plan to v1.0 and beyond |
| [`CONTRIBUTING.md`](CONTRIBUTING.md) | Dev setup, test harness, rule-authoring template |
| [`CHANGELOG.md`](CHANGELOG.md) | Version history |
| [`SECURITY.md`](SECURITY.md) | Disclosure policy |

## License

MIT
