Metadata-Version: 2.4
Name: cortado-ms
Version: 0.16.2
Summary: Confidence estimation for extreme-scale proteomics experiments
Home-page: https://github.com/seerbio/cortado
Author: Seth Just
Author-email: sjust@seer.bio
Project-URL: Bug Tracker, https://github.com/seerbio/cortado/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: numpy>=1.18.1
Requires-Dist: pandas>=1.3.0
Requires-Dist: pyspark[sql]<4,>=3.3.1
Requires-Dist: wheely-mammoth<2.0.0,>=0.16.1
Requires-Dist: crema-ms<2.0.0,>=0.0.6
Requires-Dist: importlib-metadata>=5.1.0
Requires-Dist: scipy<2.0.0
Provides-Extra: docs
Requires-Dist: numpydoc>=1.0.0; extra == "docs"
Requires-Dist: sphinx-argparse>=0.2.5; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=0.5.0; extra == "docs"
Requires-Dist: nbsphinx>=0.7.1; extra == "docs"
Requires-Dist: ipykernel>=5.3.0; extra == "docs"
Requires-Dist: recommonmark>=0.5.0; extra == "docs"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: polars; extra == "test"
Provides-Extra: dev
Requires-Dist: pre-commit>=2.7.1; extra == "dev"
Requires-Dist: black>=20.8b1; extra == "dev"
Dynamic: license-file

<img alt="cortado logo" src="./docs/_static/cortado_logo.png" height="128" align="left" style="margin: 8px">

**cortado** is a Python package that implements various methods to estimate
false discovery rates (FDR) in extreme-scale mass spectrometry proteomics
experiments. cortado is focused on gathering and reducing datasets to enable
applying efficient implementations to estimate FDR. An additional focus is
supporting peptide-centric searching through the "mixture-maximum" (MixMax)
approach, which also supports spectrum-centric searches.

Cortado is designed to be easy-to-use and flexible. Currently the following modes
are supported:
- Global PSM FDR
- Global peptide or precursor FDR
- MixMax _q_-value estimation
- Protein scoring by rolling up PSM scores
- Protein _q_-value estimation

## Installation  

cortado requires Python 3.8+ and can be installed with pip:  

```shell
pip install cortado-ms
```

## Basic Usage  

Using `cortado` requires that you collect a dataset of PSMs, which can easily be
accomplished with the
[`wheely-mammoth` library](https://github.com/seerbio/wheely-mammoth), which is
installed when you install `cortado-ms`:

```pycon
>>> from wheely.mammoth.parsers import read_encyclopedia_features
>>> ds = read_encyclopedia_features("data/*.features.txt")
```

You can then compute _q_-values using the `assign_confidence` function:

```pycon
>>> from cortado import assign_confidence
>>> conf = assign_confidence(ds, "primary", desc=False)
>>> print(f"pi0={100*conf.pi0:.2f}%")
pi0=36.74%
>>> conf.data.toPandas()[["primary", "sequence", "q-value"]]
       primary                                           sequence   q-value
0    43.783160                             -.LSLEGDHSTPPSAYGSVK.-  0.000000
1    41.212322                           -.NGPLEVAGAAVSAGHGLPAK.-  0.000000
2    36.081665                               -.IMDPNIVGSEHYDVAR.-  0.000000
3    34.730186                  -.TTHYTPLAC[+57.0214635]GSNPLKR.-  0.000000
4    34.666630                             -.SGPKPFSAPKPQTSPSPK.-  0.000000
..         ...                                                ...       ...
827   8.150498                                     -.FGVEQDRMDK.-  0.414025
828   8.065474                                     -.QYYYSADGSR.-  0.416572
829   8.028464                                     -.APHDFQFVQK.-  0.416572
830   7.946067                                     -.KLYEDAQMAR.-  0.417337
831   7.632076  -.LTC[+57.0214635]PC[+57.0214635]C[+57.0214635...  0.420368

[832 rows x 3 columns]
```
