Alevin efficiently estimates accurate gene abundances from dscRNA-seq data.

Malik, Laraib; Patro, Rob; Smith, Tom; Srivastava, Avi; Sudbery, Ian

Alevin efficiently estimates accurate gene abundances from dscRNA-seq data.

Authors: Laraib Malik
Rob Patro
Tom Smith
Avi Srivastava
Ian Sudbery
Publication date: 1 March 2019
Publisher: Genome Biol
Doi

Abstract

We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular identifier (UMI) deduplication, gene count estimation, and cell barcode whitelisting. Alevin's approach to UMI deduplication considers transcript-level constraints on the molecules from which UMIs may have arisen and accounts for both gene-unique reads and reads that multimap between genes. This addresses the inherent bias in existing tools which discard gene-ambiguous reads and improves the accuracy of gene abundance estimates. Alevin is considerably faster, typically eight times, than existing gene quantification approaches, while also using less memory