Metadata-Version: 2.4
Name: experl-core
Version: 0.1.0.dev6
Summary: Train large language models using reinforcement learning with human feedback.
Home-page: https://github.com/vjkhambe/experl-core
Author: Vijay Khambe
Author-email: vj.khambe@gmail.com
Keywords: transformers,huggingface,language modeling,post-training,rlhf,sft,dpo,ppo
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: accelerate==1.11.0
Requires-Dist: datasets==4.3.0
Requires-Dist: transformers==4.57.1
Requires-Dist: trl==0.24.0
Requires-Dist: peft==0.18.0
Requires-Dist: mlflow==3.5.1
Requires-Dist: hydra-core==1.3.2
Requires-Dist: werkzeug==2.3.8
Provides-Extra: cuda
Requires-Dist: bitsandbytes==0.48.2; extra == "cuda"
Requires-Dist: torchvision==0.24.0; extra == "cuda"
Dynamic: license-file

# 🧠 Experl — RLHF Framework for Post-Training Large Language Models

---

## 📜 Overview

Experl is a framework designed for post-training large language and foundation models using Reinforcement Learning from Human Feedback (RLHF) techniques such as Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO).

Built on top of Hugging Face TRL and Transformers, Experl provides a flexible and extensible platform for training, evaluation, and experimentation in RLHF-based fine-tuning workflows.
It integrates seamlessly with:

- 🤗 TRL – provides trainer abstractions for RL pipelines
- 🤗 Transformers – for model and tokenizer management
- ⚙️ Hydra – for structured and hierarchical configuration management
- 📊 MLflow – used for training experiment tracking, metrics logging, datasets and model versioning


Experl simplifies the complex process of RLHF post-training, allowing researchers and engineers to focus on experimentation, comparison, and innovation — rather than boilerplate code.

---
## 🚀 Getting Started

## Installation

### Python Package

Install the library using `pip`:

```bash
pip install experl-core[cuda]
```

### From source

If you want to use the latest features before an official release, you can install Experl from source:

```bash
pip install git+https://github.com/vjkhambe/experl-core.git
```

### Repository

If you want to use the examples you can clone the repository with the following command:

```bash
git clone https://github.com/vjkhambe/experl-core.git
```
---

## Command Line Interface (CLI)

You can use the Experl Command Line Interface (CLI) to quickly get started with post-training methods like Proximal Policy Optimization (PPO) or Direct Preference Optimization (DPO):

**PPO:**

```bash
experl ppo \
++model_name_or_path=google/gemma-3-270m-it \
++max_seq_length=128 \
++reward_model_name_or_path=google/gemma-3-270m-it
```

**DPO:**

```bash
experl dpo \
++model_name_or_path=google/gemma-3-270m-it \
++max_seq_length=128 \
++judge.model_name_or_path=google/gemma-3-270m-it 
```

## 🧾 Experiment Tracking with MLflow

Experl integrates seamlessly with [MLflow](https://mlflow.org/) for experiment tracking, metrics visualization, and model versioning.

During training, all runs are automatically logged to MLflow under `experl_output/mlflow_runs`.

To start the MLflow UI locally and explore your PPO/DPO results:

```bash
mlflow ui --backend-store-uri experl_output/mlflow_runs
```

## Citation

```bibtex
@misc{vjkhambe2025experl,
  author = {Vijay Khambe},
  title = {Experl: Reinforcement Learning from Human Feedback Framework},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/vjkhambe/experl-core}},
}
```

## License

This repository's source code is available under the [Apache-2.0 License](LICENSE).
