Metadata-Version: 2.1
Name: palindrome-tree
Version: 1.2.0
Summary: Gradient boosted decision tree palindrome predictor, used to locate regions for further investigation thru http://palindromes.ibp.cz/
Home-page: https://github.com/patrikkaura/palindrome-tree
License: MIT
Keywords: DNA,palindrome
Author: jaromir.kratochvil
Author-email: 171433@vutbr.cz
Requires-Python: >=3.7.1,<4.0.0
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Dist: pandas (>=1.3.5,<2.0.0)
Requires-Dist: requests (>=2.26.0,<3.0.0)
Requires-Dist: scikit-learn (>=1.0.2,<2.0.0)
Requires-Dist: xgboost (>=1.5.1,<2.0.0)
Project-URL: Repository, https://github.com/patrikkaura/palindrome-tree
Description-Content-Type: text/markdown

# Palindrome tree

Palindrome tree tool is used for analyzing inverted repeats in various DNA sequences using decision trees. This tool takes provided sequences and finds interesting parts in which there's high probability of palindrome occurrence using decision tree. This process filters a big portion of data. Interesting data are then analyzed using API from [Palindrome Analyzer](http://dx.doi.org/10.1016/j.bbrc.2016.09.015). DNA Analyser is a web-based server for nucleotide sequence analysis. It has been developed thanks to cooperation of Department of Informatics, Mendel’s University in Brno and Institute of Biophysics, Academy of Sciences of the Czech Republic. 

## Requirements

Palindrome tree was built with Python 3.7+.

## Installation

To install palindrome tree use [Pypi](https://pypi.org/project/palindrome-tree/) repository.

```commandline
pip install palindrome-tree
```

## Usage

User has to initialize palindrome tree analyzer instance which is imported from main package `palindrome_tree`.

```python
from palindrome_tree import PalindromeTree

tree = PalindromeTree()
```

### Predict regions (without API validation)

To predict regions with possible palindromes, run analyse without setting `check_with_api` paramether. 

```python
sequence_file = open("/path/to/sequence/name.txt", "r")

results = tree.analyse(
    sequence=sequence_file.read(),
    check_with_api=False,
)
```
The results are then stored in results variable as `pd.DataFrame`. 

|    |   position | sequence                       |
|---:|-----------:|:-------------------------------|
|  0 |          8 | TTTGTAGAGACAGGGTCTTGCTGTGTTTCC |
|  1 |         10 | TGTAGAGACAGGGTCTTGCTGTGTTTCCCA |
|  2 |         49 | CGAACTCCTGGCCTCTAGGCAATCCTCCCA |
|  3 |        102 | ATCCCACTCTTTTTTGAAAAATAAAATCTA |
|  4 |        105 | CCACTCTTTTTTGAAAAATAAAATCTACCA |

### Predict regions (with API validation)

To predict regions with possible palindromes and afterward validation, run analyse with `check_with_api` paramether set. 

```python
sequence_file = open("/path/to/sequence/name.txt", "r")

results = tree.analyse(
    sequence=sequence_file.read(),
    check_with_api=True,
)
```
The results are also stored in results variable as `pd.DataFrame`. 

|    |   original_index | after   | before   |   mismatches | opposite   |   position | sequence   | signature   | spacer   | stability_NNModel                                                                |
|---:|-----------------:|:--------|:---------|-------------:|:-----------|-----------:|:-----------|:------------|:---------|:---------------------------------------------------------------------------------|
|  0 |                0 | CC      | TTTGT    |            2 | CTGTGTTT   |          5 | AGAGACAG   | 8-7-2       | GGTCTTG  | {'cruciform': -5.74, 'linear': -27.590000000000003, 'delta': 21.85}              |
|  1 |                0 | TGCTG   | TTTGT    |            2 | GGGTCT     |          5 | AGAGAC     | 6-1-2       | A        | {'cruciform': -2.54, 'linear': -13.84, 'delta': 11.3}                            |
|  2 |                0 | GTGTT   | TGTAG    |            2 | CTTGCT     |          7 | AGACAG     | 6-3-2       | GGT      | {'cruciform': -1.94, 'linear': -17.509999999999998, 'delta': 15.569999999999999} |
|  3 |                0 | TTCC    | TAGAG    |            2 | CTGTGT     |          9 | ACAGGG     | 6-5-2       | TCTTG    | {'cruciform': -3.7399999999999998, 'linear': -20.99, 'delta': 17.25}             |
|  4 |                1 | CCCA    | TGT      |            2 | CTGTGTTT   |          3 | AGAGACAG   | 8-7-2       | GGTCTTG  | {'cruciform': -5.74, 'linear': -27.590000000000003, 'delta': 21.85}              |

## Dependencies

* xgboost = "^1.5.1"
* pandas = "^1.3.5"
* scikit-learn = "^1.0.2"
* requests = "^2.26.0"

## Authors

* **Patrik Kaura** - *Main developer* - [patrikkaura](https://gitlab.com/PatrikKaura/)

* **Jaromir Kratochvil** - *Developer* - [jaromirkratochvil](https://github.com/kratjar)

* **Jiří Šťastný** - *Supervisor*

## License

This project is licensed under the MIT License - see the [
LICENSE
](
LICENSE
) file for details. 


