Metadata-Version: 2.1
Name: mpitree
Version: 0.0.3
Summary: A Parallel Decision Tree implementation using MPI
Home-page: https://github.com/duong-jason/mpitree
Author: Jason Duong
Author-email: my.toe.ben@gmail.com
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: testing
License-File: LICENSE

![Unit Tests](https://github.com/duong-jason/mpilearn/workflows/Unit%20Tests/badge.svg)
![Formatter](https://github.com/duong-jason/mpilearn/workflows/Lint/badge.svg)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/PyCQA/pylint)


# mpitree

A Parallel Decision Tree implementation using MPI *(Message Passing Interface)*.

## How it Works

<p align="center">
  <img src="https://github.com/duong-jason/mpitree/blob/main/images/psplit.png" alt="Example of process splits"/>
</p>

For every *interior* decision tree node, a variable number of processes calculate the best feature to split. Let $n$ be the number of processes and $p$ be the number of levels. Processes in a *group* independently participate among themselves at their respective levels. Each process is assigned in the cyclic distribution or round-robin fashion such that their $group = (\lfloor ranks/n/p\rfloor)\mod p$ and $rank = |group|\ /\ rank$.

In the above diagram, the root node consists of eight total processes, $p_0, p_1, ..., p_7$, with three distinct feature levels, $l_0, l_1, l_2$. Group $1$ consists of processes and ranks, $\{(0,0), (1,1), (6,2), (7,3)\}$ respectively, Group $2$ consists of processes and ranks, $\{(2,0), (3,1)\}$ respectively and Group $3$ consists of processes and ranks, $\{(4,0), (5,1)\}$ respectively.

Please note a split is only performed by all processes at an *interior node*. Therefore, this implies a leaf node may consist of more than one process, as the purity measurement at a node is independent of the number of processes.

After the splitting of processes at an interior node, each routine waits for their respective processes from their original group to finish executing. The completion of a routine constitutes the creation of a sub-tree on a particular path from the root and the local group is de-allocated. All sub-trees are recursively gathered to the root process.

## Installation

Using [Github](https://github.com/duong-jason/mpitree)
```bash
git clone https://github.com/duong-jason/mpitree.git
cd mpitree
```

Using [pip](https://pypi.org/project/mpitree/)
```bash
pip install mpitree
```

## Example

```python
from mpi4py import MPI
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

from mpitree.parallel_decision_tree import (
    ParallelDecisionTreeClassifier,
    world_comm,
    world_rank,
)

if __name__ == "__main__":
    iris = load_iris(as_frame=True)

    X_train, X_test, y_train, y_test = train_test_split(
        iris.data, iris.target, test_size=0.20, random_state=42
    )

    world_comm.Barrier()
    start_time = MPI.Wtime()

    pdt = ParallelDecisionTreeClassifier(criterion={'max_depth': 3})
    pdt.fit(X_train, y_train)

    score = pdt.score(X_test, y_test)

    end_time = MPI.Wtime()
    if not world_rank:
        print(pdt)
        print(f"Accuracy: {score:.2%}")
        print(f"Parallel Execution Time: {end_time - start_time:.3f}s")
```

```bash
$ mpiexec -n 5 python3 main.py

petal length (cm) (< 2.45)
        0
        petal length (cm) (< 4.75)
                petal width (cm) (< 1.65)
                        1
                        2
                petal width (cm) (< 1.75)
                        2
                        2
Accuracy: 96.67%
Parallel Execution Time: 1.895s
```

## Unit Tests

```
python3 -m pytest
```

## Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

## Licence

[MIT](https://github.com/duong-jason/mpitree/blob/main/LICENSE)
