PyPairs Documentation¶
PyPairs - A python scRNA-Seq classifier¶
This is a python-reimplementation of the Pairs algorithm as described by A. Scialdone et. al. (2015). Original Paper available under: <https://doi.org/10.1016/j.ymeth.2015.06.021>
A supervided maschine learning algorithm aiming to classify single cells based on their transcriptomic signal. Initially created to predict cell cycle phase from scRNA-Seq data, this algorithm can be used for various applications.
Build to be fully compatible with Scanpy [Wolf18].
Code available on GitHub.
Authors¶
- Antonio Scialdone - original algorithm
- Ron Fechtner - implementation and extension in Python
Release notes¶
Versions¶
Version 3.1.0, Apr 4, 2019¶
- New feature:
- Multithreading now available for pais.cyclone()
- Minor changes and fixes:
- pais.sandbag() now significally faster
- pais.sandbag() more stable in terms of memory access
Version 3.0.1 - 3.0.13, Mar 13, 2019¶
- Various bug fixes, including:
- Bioconda compability
- Dataset loading
- Cache file required
- Cell Cycle specific scoring
Version 3.0.0, Jan 18, 2019 - Jan 31, 2019¶
Version 2.0.1 - 2.0.6, Nov 22, 2018¶
- Minor bug fixes and improvements.
Version 2.0.0, Aug 14, 2018¶
Version 1.0.1 - 1.0.3, Jul 29, 2018¶
- Bug fixes and improvements. (Mostly bugs though)
- Added multi-core processing
Version 1.0.0, Mar 4, 2018¶
- Speed and performance improvements.
Version 0.1, Feb 22, 2018¶
- Simple python reimplementation of the Pairs algorithm.
- Included sandbag() and cyclone() algorithms
Getting Started¶
Installation¶
This package is hosted at PyPi ( https://pypi.org/project/pypairs/ ) and can be installed on any system running Python3 via pip with:
pip install pypairs
Alternatively, pypairs can be installed using Conda (most easily obtained via the Miniconda Python distribution:
conda install -c bioconda pypairs
Minimal Example¶
Datasets provide a example scRNA dataset and default marker pairs for cell cycle prediction:
from pypairs import pairs, datasets
# Load samples from the oscope scRNA-Seq dataset with known cell cycle
training_data = datasets.leng15(mode='sorted')
# Run sandbag() to identify marker pairs
marker_pairs = pairs.sandbag(training_data, fraction=0.6)
# Load samples from the oscope scRNA-Seq dataset without known cell cycle
testing_data = datasets.leng15(mode='unsorted')
# Run cyclone() score and predict cell cycle classes
result = pairs.cyclone(testing_data, marker_pairs)
# Further downstream analysis
print(result)
Documentation¶
To use PyPairs import the package as i.e. follows:
from pypairs import pairs, datasets, settings, utils
Sandbag¶
This function implements the classification step of the pair-based prediction method described by Scialdone et al. (2015) [Scialdone15].
To illustrate, consider classification of cells into G1 phase.
Pairs of marker genes are identified with sandbag()
, where the expression of the first gene in the training
data is greater than the second in G1 phase but less than the second in all other phases.
pairs.sandbag (data[, annotation, …]) |
Calculate ‘marker pairs’ from a genecount matrix. |
Cyclone¶
For each cell, cyclone()
calculates the proportion of all marker pairs where the expression of the first gene is
greater than the second in the new data (pairs with the same expression are ignored). A high
proportion suggests that the cell is likely to belong to this category, as the expression ranking in the
new data is consistent with that in the training data. Proportions are not directly comparable between phases
due to the use of different sets of gene pairs for each phase. Instead, proportions are converted into scores
that account for the size and precision of the proportion estimate. The same process is repeated for
all phases, using the corresponding set of marker pairs in pairs.
pairs.cyclone (data[, marker_pairs, …]) |
Score samples for each category based on marker pairs. |
While this method is described for cell cycle phase classification, any biological groupings can be used here. However, for non-cell cycle phase groupings users should manually apply their own score thresholds for assigning cells into specific groups.
Datasets¶
datasets.leng15 ([mode, gene_sub, sample_sub]) |
Single cell RNA-seq data of human hESCs to evaluate Oscope [Leng15] |
datasets.default_cc_marker ([dataset]) |
Cell cycle marker pairs derived from [Leng15] with the default sandbag() settings. |
Quality Assesment¶
utils.evaluate_prediction (prediction, reference) |
Calculates F1 Score, Recall and Precision of a cyclone() prediction. |
Utils¶
utils.export_marker (marker, fname[, defaultpath]) |
Export marker pairs to json-File. |
utils.load_marker (fname[, defaultpath]) |
Export marker pairs to json-File. |
Settings¶
The default directories for saving figures and caching files.
settings.figdir |
Directory for saving figures (default: './figures/' ). |
settings.cachedir |
Directory for cache files (default: './cache/' ). |
The verbosity of logging output, where verbosity levels have the following meaning: 0=’error’, 1=’warning’, 2=’info’, 3=’hint’
settings.verbosity |
Verbosity level (default: 1). |
Print versions of packages that might influence numerical results.
log.print_versions () |
Versions that might influence the numerical results. |
References¶
[Leng15] | Leng et al. (2015) Oscope identifies oscillatory genes in unsynchronized single-cell RNA-seq experiments., Nat Methods. |
[Scialdone15] | Scialdone et al. (2015), Computational assignment of cell-cycle stage from single-cell transcriptome data, Methods. |
[Wolf18] | Wolf et al. (2018) SCANPY: large-scale single-cell gene expression data analysis, Genome Biology. |