28 research outputs found

    From identification to validation to gene count

    Get PDF
    The current GENCODE gene count of ~ 30,000, including 21,727 protein-coding and 8,483 RNA genes, is significantly lower than the 100,000 genes anticipated by early estimates. Accurate annotation of protein-coding and non-coding genes and pseudogenes is essential in calculating the true gene count and gaining insight into human evolution. As part of the GENCODE Consortium, the HAVANA team produces high quality manual gene annotation, which forms the basis for the reference gene set being used by the ENCODE project and provides a rich annotation of alternative splice variants and assignment of functional potential. However, the protein-coding potential of some splice variants is uncertain and valid splice variants can remain unannotated if they are absent from current cDNA libraries. Recent technological developments in sequencing and mass spectrometry have created a vast amount of new transcript and protein data that facilitate the identification and validation of new and existing transcripts, while harboring their own limitations and problems

    Quantitative HDL Proteomics Identifies Peroxiredoxin-6 as a Biomarker of Human Abdominal Aortic Aneurysm

    Get PDF
    High-density lipoproteins (HDLs) are complex protein and lipid assemblies whose composition is known to change in diverse pathological situations. Analysis of the HDL proteome can thus provide insight into the main mechanisms underlying abdominal aortic aneurysm (AAA) and potentially detect novel systemic biomarkers. We performed a multiplexed quantitative proteomics analysis of HDLs isolated from plasma of AAA patients (N = 14) and control study participants (N = 7). Validation was performed by western-blot (HDL), immunohistochemistry (tissue), and ELISA (plasma). HDL from AAA patients showed elevated expression of peroxiredoxin-6 (PRDX6), HLA class I histocompatibility antigen (HLA-I), retinol-binding protein 4, and paraoxonase/arylesterase 1 (PON1), whereas alpha-2 macroglobulin and C4b-binding protein were decreased. The main pathways associated with HDL alterations in AAA were oxidative stress and immune-inflammatory responses. In AAA tissue, PRDX6 colocalized with neutrophils, vascular smooth muscle cells, and lipid oxidation. Moreover, plasma PRDX6 was higher in AAA (N = 47) than in controls (N = 27), reflecting increased systemic oxidative stress. Finally, a positive correlation was recorded between PRDX6 and AAA diameter. The analysis of the HDL proteome demonstrates that redox imbalance is a major mechanism in AAA, identifying the antioxidant PRDX6 as a novel systemic biomarker of AAA.We thank Simon Bartlett for language and scientific editing. This study was supported by the Spanish Ministry of Economy and Competitiveness (MINECO) (SAF2016-80843-R, BIO2012-37926 and BIO2015-67580-P), Fondo de Investigaciones Sanitarias ISCiii-FEDER (PRB2) (IPT13/0001, ProteoRed, Redes RIC RD12/0042/00038 and RD12/0042/0056, Biobancos RD09/0076/00101 and CA12/00371), Centro de Investigacion Biomedica en Red de Diabetes y Enfermedades Metabolicas Asociadas (CIBERDEM), and FRIAT. The CNIC is supported by the Spanish Ministry of Economy and Competitiveness (MINECO) and the Pro-CNIC Foundation, and is a Severo Ochoa Center of Excellence (MINECO award SEV-2015-0505).S

    Comprehensive Quantification of the Modified Proteome Reveals Oxidative Heart Damage in Mitochondrial Heteroplasmy

    Get PDF
    Post-translational modifications hugely increase the functional diversity of proteomes. Recent algorithms based on ultratolerant database searching are forging a path to unbiased analysis of peptide modifications by shotgun mass spectrometry. However, these approaches identify only one-half of the modified forms potentially detectable and do not map the modified residue. Moreover, tools for the quantitative analysis of peptide modifications are currently lacking. Here, we present a suite of algorithms that allows comprehensive identification of detectable modifications, pinpoints the modified residues, and enables their quantitative analysis through an integrated statistical model. These developments were used to characterize the impact of mitochondrial heteroplasmy on the proteome and on the modified peptidome in several tissues from 12-week-old mice. Our results reveal that heteroplasmy mainly affects cardiac tissue, inducing oxidative damage to proteins of the oxidative phosphorylation system, and provide a molecular mechanism explaining the structural and functional alterations produced in heart mitochondria.We thank Simon Bartlett (CNIC) for English editing. This study was supported by competitive grants from the Spanish Ministry of Economy and Competitiveness (MINECO) (BIO2015-67580-P) through the Carlos III Institute of Health-Fondo de Investigacion Sanitaria (PRB2, IPT13/0001-ISCIII-SGEFI/FEDER; ProteoRed), by Fundacion La Marato TV3, and by FP7-PEOPLE-2013-ITN ``Next-Generation Training in Cardiovascular Research and Innovation-Cardionext.'' N.B. is a FP7-PEOPLE-2013-ITN-Cardionext Fellow. The CNIC is supported by the MINECO and the Pro-CNIC Foundation, and is a Severo Ochoa Center of Excellence (MINECO Award SEV-2015-0505).S

    SQANTI : extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification

    Get PDF
    High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes

    Inference of Functional Relations in Predicted Protein Networks with a Machine Learning Approach

    Get PDF
    Background: Molecular biology is currently facing the challenging task of functionally characterizing the proteome. The large number of possible protein-protein interactions and complexes, the variety of environmental conditions and cellular states in which these interactions can be reorganized, and the multiple ways in which a protein can influence the function of others, requires the development of experimental and computational approaches to analyze and predict functional associations between proteins as part of their activity in the interactome. Methodology/Principal Findings: We have studied the possibility of constructing a classifier in order to combine the output of the several protein interaction prediction methods. The AODE (Averaged One-Dependence Estimators) machine learning algorithm is a suitable choice in this case and it provides better results than the individual prediction methods, and it has better performances than other tested alternative methods in this experimental set up. To illustrate the potential use of this new AODE-based Predictor of Protein InterActions (APPIA), when analyzing high-throughput experimental data, we show how it helps to filter the results of published High-Throughput proteomic studies, ranking in a significant way functionally related pairs. Availability: All the predictions of the individual methods and of the combined APPIA predictor, together with the used datasets of functional associations are available at http://ecid.bioinfo.cnio.es/. Conclusions: We propose a strategy that integrates the main current computational techniques used to predict functional associations into a unified classifier system, specifically focusing on the evaluation of poorly characterized protein pairs. We selected the AODE classifier as the appropriate tool to perform this task. AODE is particularly useful to extract valuable information from large unbalanced and heterogeneous data sets. The combination of the information provided by five prediction interaction prediction methods with some simple sequence features in APPIA is useful in establishing reliability values and helpful to prioritize functional interactions that can be further experimentally characterized.This work was funded by the BioSapiens (grant number LSHG-CT-2003-503265) and the Experimental Network for Functional Integration (ENFIN) Networks of Excellence (contract number LSHG-CT-2005-518254), by Consolider BSC (grant number CSD2007-00050) and by the project “Functions for gene sets” from the Spanish Ministry of Education and Science (BIO2007-66855). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Analyzing the First Drafts of the Human Proteome

    No full text
    This letter analyzes two large-scale proteomics studies published in the same issue of <i>Nature</i>. At the time of the release, both studies were portrayed as draft maps of the human proteome and great advances in the field. As with the initial publication of the human genome, these papers have broad appeal and will no doubt lead to a great deal of further analysis by the scientific community. However, we were intrigued by the number of protein-coding genes detected by the two studies, numbers that far exceeded what has been reported for the multinational Human Proteome Project effort. We carried out a simple quality test on the data using the olfactory receptor family. A high-quality proteomics experiment that does not specifically analyze nasal tissues should not expect to detect many peptides for olfactory receptors. Neither of the studies carried out experiments on nasal tissues, yet we found peptide evidence for more than 100 olfactory receptors in the two studies. These results suggest that the two studies are substantially overestimating the number of protein coding genes they identify. We conclude that the experimental data from these two studies should be used with caution

    SanXoT: a modular and versatile package for the quantitative analysis of high-throughput proteomics experiments

    Get PDF
    SUMMARY: Mass spectrometry-based proteomics has had a formidable development in recent years, increasing the amount of data handled and the complexity of the statistical resources needed. Here we present SanXoT, an open-source, standalone software package for the statistical analysis of high-throughput, quantitative proteomics experiments. SanXoT is based on our previously developed weighted spectrum, peptide and protein statistical model and has been specifically designed to be modular, scalable and user-configurable. SanXoT allows limitless workflows that adapt to most experimental setups, including quantitative protein analysis in multiple experiments, systems biology, quantification of post-translational modifications and comparison and merging of experimental data from technical or biological replicates. AVAILABILITY AND IMPLEMENTATION: Download links for the SanXoT Software Package, source code and documentation are available at https://wikis.cnic.es/proteomica/index.php/SSP. CONTACT: [email protected] or [email protected]. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.This study was supported by competitive grants [BIO2012-37926, BIO2015-67580-P] from the Spanish Ministry of Economy, Industry and Competitiveness (MEIC), [grant IPT13/0001] (ProteoRed, PRB2, ISCIII-SGEFI/ERDF), [grant IPT17/0019] (ProteoRed, PRB3, ISCIII-SGEFI/ERDF), the Fundacio La Marato de TV3, and the European Commission FP7 (FP7-PEOPLE-2013-ITN Next generation training in cardiovascular research and innovation-CardioNext). The Centro Nacional de Investigaciones Cardiovasculares Carlos is supported by the Spanish Ministry of Economy, Industry and Competitiveness (MEIC) and the Pro-CNIC Foundation, and is a Severo Ochoa Center of Excellence (MEIC award SEV-2015-0505).S
    corecore