2 research outputs found
PPLine: An Automated Pipeline for SNP, SAP, and Splice Variant Detection in the Context of Proteogenomics
The fundamental mission of the Chromosome-Centric
Human Proteome
Project (C-HPP) is the research of human proteome diversity, including
rare variants. Liver tissues, HepG2 cells, and plasma were selected
as one of the major objects for C-HPP studies. The proteogenomic approach,
a recently introduced technique, is a powerful method for predicting
and validating proteoforms coming from alternative splicing, mutations,
and transcript editing. We developed PPLine, a Python-based proteogenomic
pipeline providing automated single-amino-acid polymorphism (SAP),
indel, and alternative-spliced-variants discovery based on raw transcriptome
and exome sequence data, single-nucleotide polymorphism (SNP) annotation
and filtration, and the prediction of proteotypic peptides (available
at https://sourceforge.net/projects/ppline). In this work,
we performed deep transcriptome sequencing of HepG2 cells and liver
tissues using two platforms: Illumina HiSeq and Applied Biosystems
SOLiD. Using PPLine, we revealed 7756 SAP and indels for HepG2 cells
and liver (including 659 variants nonannotated in dbSNP). We found
17 indels in transcripts associated with the translation of alternate
reading frames (ARF) longer than 300 bp. The ARF products of two genes, <i>SLMO1</i> and <i>TMEM8A</i>, demonstrate signatures
of caspase-binding domain and Gcn5-related <i>N</i>-acetyltransferase.
Alternative splicing analysis predicted novel proteoforms encoded
by 203 (liver) and 475 (HepG2) genes according to both Illumina and
SOLiD data. The results of the present work represent a basis for
subsequent proteomic studies by the C-HPP consortium