Metadata-Version: 2.4
Name: vds-nutrition-labels
Version: 0.1.0
Summary: A CLI tool for building vulnerability dataset nutrition labels.
Project-URL: Homepage, https://github.com/gOATiful/Vulnerability-Dataset-Nutrition-Labels
Project-URL: Repository, https://github.com/gOATiful/Vulnerability-Dataset-Nutrition-Labels
Project-URL: Issues, https://github.com/gOATiful/Vulnerability-Dataset-Nutrition-Labels/issues
Author-email: Torge Hinrichs <torge.hinrichs@tuhh.de>
Maintainer-email: Torge Hinrichs <torge.hinrichs@tuhh.de>
License-Expression: MIT
License-File: LICENSE
Keywords: cli,dataset-analysis,nutrition-labels,security,vulnerability-datasets
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Requires-Dist: lizard>=1.17.10
Requires-Dist: tiktoken>=0.7.0
Requires-Dist: tomli>=2.0.0
Requires-Dist: tqdm>=4.66.0
Description-Content-Type: text/markdown

# Vulnerability Dataset Nutrition Labels

A command-line tool for building vulnerability dataset nutrition labels from configuration files.

## Features

- Generate vulnerability dataset nutrition labels using a TOML config file
- Built-in config template creation
- Runs both quality and structural dataset analysis
- Supports train/test/validation split metadata and field mappings

## Installation

Install locally using the included Python environment or pip:

```bash
python -m pip install -e .
```

If you are running on Python 3.9 or 3.10, install `tomli` for TOML support:

```bash
python -m pip install tomli
```

## Usage

Create a default configuration template:

```bash
vds-nutrition-labels --create-config-template
```

Or create it at a specific location:

```bash
vds-nutrition-labels --create-config-template path/to/vds-config.toml
```

Run the full analysis pipeline using a config file:

```bash
vds-nutrition-labels --config vds-config.toml --output vds_card_analysis.json --run_analysis
```

Enable verbose output:

```bash
vds-nutrition-labels --config vds-config.toml --output vds_card_analysis.json --run_analysis --verbose
```

## Configuration

The CLI expects a TOML file with a top-level `[dataset]` section. The built-in template includes:

- `name`, `description`, `version`, `license`
- `has_runable_code_or_test_cases`
- `languages`
- `files` section for `train`, `test`, and `valid` dataset paths
- `fields` section for field names such as `function`, `label`, `cve`, `cwe`, and `project`
- `analysis.quality_metrics` including `completeness`, `diversity`, `balance`, `timespan`, `uniqueness`, and `cross_contamination`
- `analysis.structural_metrics` including `loc`, `tokens`, and `cyclomatic_complexity`

Example:

```toml
[dataset]
name = "Example Dataset"
description = "Dataset of annotated images"
version = "1.0.0"
license = "MIT"
has_runable_code_or_test_cases = false
languages = "c,c++,python"

[dataset.files]
train = "path/to/trainfile"
test = "path/to/testfile"
valid = "path/to/validfile"

[dataset.fields]
function = "func"
label = "label"
vuln_label_value = 1
cve = "cve"
cwe = "cwe"
project = "project_url"

[dataset.analysis.quality_metrics]
completeness = true
diversity = true
balance = true
timespan = true
uniqueness = true
cross_contamination = true

[dataset.analysis.structural_metrics]
loc = true
tokens = true
cyclomatic_complexity = true
node_diversity = true
```

## Project

- Package name: `vds-nutrition-labels`
- Entry point: `vds_nutrition_labels.cli:main`
- License: MIT

## Contributing

Contributions, bug reports, and feature requests are welcome. Please open an issue or submit a pull request on the GitHub repository.
