Search CORE

35 research outputs found

Proteogenómica y splicing alternativo

Author: Ezkurdia Garmendia Iakes
Publication venue
Publication date: 01/01/2016
Field of study

Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 08 de febrero de 2016La anotación manual de los genes codificantes de proteína requiere diversas fuentes de evidencia. Conseguir evidencia experimental de la expresión de las proteínas sigue siendo un reto técnico complicado. La mayoría de métodos se basan en predicciones computacionales y evidencia experimental a nivel de transcrito. La tecnología de espectrometría de masas ha avanzado considerablemente en las dos últimas décadas, situándola como una herramienta puntera para proyectos de anotación genómica. La espectrometría de masas permite la depuración y validación de genes codificantes y transcritos alternativos, así como la detección de nuevas regiones codificantes. La proteogenómica, una disciplina entre la genómica y la proteómica, requiere el desarrollo de métodos y estrategias computacionales para el análisis de datos a gran escala. El objetivo principal de esta tesis es desarrollar métodos computacionales para el proceso y análisis de datos proteómicos y genómicos. Para ello se han diseñado varias estrategias de análisis de datos proteómicos a gran escala. En la primera parte se aplican los flujos de trabajo diseñado para la búsqueda, validación y curación de resultados proteómicos, a partir de diversas fuentes de datos genómicos. La caracterización de isoformas alternativas y eventos de splicing en humano y ratón muestra tres grupos sobrerrepresentados. En concreto, las ribonucleoproteínas nucleares, las isoformas alternativas generadas a partir de exones homólogos, y las creadas a partir de deleciones pequeñas. El estudio se amplía utilizando una base de datos experimentales proteómicos mayor, y con ello se corrobora que la mayoría de genes expresa una proteína dominante. Se demuestra que los eventos de splicing detectados a nivel de proteína conservan los dominios funcionales. Finalmente, se ratifica que más del 20% de las isoformas de splicing están generadas por exones homólogos, que estas son específicas de tejido, y que están notablemente conservadas, advirtiéndose su posible relevancia a nivel celular. En la última parte se utilizan los péptidos de ocho experimentos proteómicos a gran escala para caracterizar la isoforma más expresada del gen. La comparativa de la isoforma proteómica más expresada coincide con la de dos métodos ortólogos analizados. Uno basado en la conservación de función y estructura, y el otro basado en anotaciones genómicas corregidas por expertos. Los resultados muestran la tendencia hacia la expresión de una sola isoforma, independientemente del tejido, y confirman la idoneidad de APPRIS para la predicción de isoformas principales.The manual annotation of protein-coding genes is based on many diverse sources of evidence. Most support comes from computational predictions, genomic evidence and experimental expression at transcript level. Finding experimental evidence for the expression of proteins remains a difficult technical challenge, but mass spectrometry technology has advanced considerably in the past two decades, becoming an important tool for genomic annotation projects. Mass spectrometry also enables the refining and validation coding genes and alternative transcripts and detection of novel coding regions. Proteogenomics, a discipline that unites genomics and proteomics requires the development of computational methods and strategies for data analysis on a large scale. The main objective of this thesis was to develop computational methods for processing and analyzing genomic and proteomic data. Several strategies to analyze large-scale proteomic data have been designed to achieve this goal. In the first part workflows designed to search, validate and curate results from a variety of sources of proteomic data were applied as part of a pilot study. The characterization of alternative splice isoforms in human and mouse experiments highlighted three overrepresented groups; specifically, ribonucleoproteins, alternative isoforms generated from homologous exons and those generated from small indels. The pilot study was later extended using a larger experimental proteomic data set. This second analysis confirmed that most genes express a dominant protein and demonstrated that splicing events detected at the protein level rarely break conserved functional domains. The large-scale study confirmed that more than 20% of splice isoforms are generated from homologous exons. Many of these alternative homologous exons are tissue specific and all are remarkably conserved, highlighting their relevance at the cellular level. Finally peptides from eight large-scale proteomic experiments are used to characterize a main experimental isoform. This main proteomics isoform matches those selected by two orthogonal methods, one predicted from conservation and protein functional and structure features, and the other annotated by manual annotators based on genomic evidence. The results show clearly that almost all genes have a principal protein isoform regardless of tissue

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

From identification to validation to gene count

Author: Aken Bronwen
Amid Clara
Carninci Piero
Ezkurdia Iakes
Frankish Adam
Gilbert James
Gingeras Thomas R.
Guigó Serra Roderic
Harrow Jennifer
HAVANA
Hubbard Tim J.
Kokocinski Felix
Searle Stephen
Tress Michael
White Simon
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

The current GENCODE gene count of ~ 30,000, including 21,727 protein-coding and 8,483 RNA genes, is significantly lower than the 100,000 genes anticipated by early estimates. Accurate annotation of protein-coding and non-coding genes and pseudogenes is essential in calculating the true gene count and gaining insight into human evolution. As part of the GENCODE Consortium, the HAVANA team produces high quality manual gene annotation, which forms the basis for the reference gene set being used by the ENCODE project and provides a rich annotation of alternative splice variants and assignment of functional potential. However, the protein-coding potential of some splice variants is uncertain and valid splice variants can remain unannotated if they are absent from current cDNA libraries. Recent technological developments in sequencing and mass spectrometry have created a vast amount of new transcript and protein data that facilitate the identification and validation of new and existing transcripts, while harboring their own limitations and problems

Crossref

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

UPF Digital Repository

King's Research Portal

Comprehensive Quantification of the Modified Proteome Reveals Oxidative Heart Damage in Mitochondrial Heteroplasmy

Author: Bagwan Navratan
Bonzon-Kulichenko Elena
Calvo Enrique
Enriquez José Antonio
Ezkurdia Iakes
Latorre-Pellicer Ana
Lechuga-Vieco Ana V.
Magni Ricardo
Michalakopoulos Spiros
Rodriguez Jose Manuel
Trevisan-Herraz Marco
Vazquez Jesus
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Post-translational modifications hugely increase the functional diversity of proteomes. Recent algorithms based on ultratolerant database searching are forging a path to unbiased analysis of peptide modifications by shotgun mass spectrometry. However, these approaches identify only one-half of the modified forms potentially detectable and do not map the modified residue. Moreover, tools for the quantitative analysis of peptide modifications are currently lacking. Here, we present a suite of algorithms that allows comprehensive identification of detectable modifications, pinpoints the modified residues, and enables their quantitative analysis through an integrated statistical model. These developments were used to characterize the impact of mitochondrial heteroplasmy on the proteome and on the modified peptidome in several tissues from 12-week-old mice. Our results reveal that heteroplasmy mainly affects cardiac tissue, inducing oxidative damage to proteins of the oxidative phosphorylation system, and provide a molecular mechanism explaining the structural and functional alterations produced in heart mitochondria.We thank Simon Bartlett (CNIC) for English editing. This study was supported by competitive grants from the Spanish Ministry of Economy and Competitiveness (MINECO) (BIO2015-67580-P) through the Carlos III Institute of Health-Fondo de Investigacion Sanitaria (PRB2, IPT13/0001-ISCIII-SGEFI/FEDER; ProteoRed), by Fundacion La Marato TV3, and by FP7-PEOPLE-2013-ITN ``Next-Generation Training in Cardiovascular Research and Innovation-Cardionext.'' N.B. is a FP7-PEOPLE-2013-ITN-Cardionext Fellow. The CNIC is supported by the MINECO and the Pro-CNIC Foundation, and is a Severo Ochoa Center of Excellence (MINECO Award SEV-2015-0505).S

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

REPISALUD

Quantitative HDL Proteomics Identifies Peroxiredoxin-6 as a Biomarker of Human Abdominal Aortic Aneurysm

Author: Burillo Elena
Camafeita Emilio
Egido Jesus
Ezkurdia Iakes
Jorge Inmaculada
Luis Martin-Ventura Jose
Martinez-Lopez Diego
Meilhac Olivier
Michel Jean-Baptiste
Miguel Blanco-Colio Luis
Trevisan-Herraz Marco
Vazquez Jesus
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

High-density lipoproteins (HDLs) are complex protein and lipid assemblies whose composition is known to change in diverse pathological situations. Analysis of the HDL proteome can thus provide insight into the main mechanisms underlying abdominal aortic aneurysm (AAA) and potentially detect novel systemic biomarkers. We performed a multiplexed quantitative proteomics analysis of HDLs isolated from plasma of AAA patients (N = 14) and control study participants (N = 7). Validation was performed by western-blot (HDL), immunohistochemistry (tissue), and ELISA (plasma). HDL from AAA patients showed elevated expression of peroxiredoxin-6 (PRDX6), HLA class I histocompatibility antigen (HLA-I), retinol-binding protein 4, and paraoxonase/arylesterase 1 (PON1), whereas alpha-2 macroglobulin and C4b-binding protein were decreased. The main pathways associated with HDL alterations in AAA were oxidative stress and immune-inflammatory responses. In AAA tissue, PRDX6 colocalized with neutrophils, vascular smooth muscle cells, and lipid oxidation. Moreover, plasma PRDX6 was higher in AAA (N = 47) than in controls (N = 27), reflecting increased systemic oxidative stress. Finally, a positive correlation was recorded between PRDX6 and AAA diameter. The analysis of the HDL proteome demonstrates that redox imbalance is a major mechanism in AAA, identifying the antioxidant PRDX6 as a novel systemic biomarker of AAA.We thank Simon Bartlett for language and scientific editing. This study was supported by the Spanish Ministry of Economy and Competitiveness (MINECO) (SAF2016-80843-R, BIO2012-37926 and BIO2015-67580-P), Fondo de Investigaciones Sanitarias ISCiii-FEDER (PRB2) (IPT13/0001, ProteoRed, Redes RIC RD12/0042/00038 and RD12/0042/0056, Biobancos RD09/0076/00101 and CA12/00371), Centro de Investigacion Biomedica en Red de Diabetes y Enfermedades Metabolicas Asociadas (CIBERDEM), and FRIAT. The CNIC is supported by the Spanish Ministry of Economy and Competitiveness (MINECO) and the Pro-CNIC Foundation, and is a Severo Ochoa Center of Excellence (MINECO award SEV-2015-0505).S

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

PubMed Central

REPISALUD

SQANTI : extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification

Author: Conesa Ana
de la Fuente Lorena
del Risco Hector
Edelmann Mariola
Ezkurdia Iakes
Ferrell Marc
Macchietto Marissa
Martens Lennart
Marti Cristina
Mellado Maravillas
Moreno-Manzano Victoria
Mortazavi Ali
Pardo-Palacios Francisco Jose
Pereira Cécile
Rodriguez-Navarro Susana
Tardaguila Manuel
Tress Michael
Vazquez Jesus
Verheggen Kenneth
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2018
Field of study

High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Ghent University Academic Bibliography

eScholarship - University of California

REPISALUD

Digital.CSIC

Inference of Functional Relations in Predicted Protein Networks with a Machine Learning Approach

Author: A Enright
A Valencia
Alfonso Valencia
Beatriz García-Jiménez
C Alfarano
C Drummond
CM Bishop
Cv Mering
Cv Mering
D Juan
David Juan
DE Rumelhart
E Frank
E Morett
EA León
Eduardo Andrés-León
EM Marcotte
F Pazos
F Pazos
F Pazos
G Butland
GF Cooper
GH John
GI Webb
H Hermjakob
Iakes Ezkurdia
IH Witten
IM Keseler
J Wu
JG Cleary
L Breiman
L Salwinski
LJ Lu
M Arifuzzaman
M Kanehisa
M Pellegrini
M Sahami
N Friedman
P Bowers
R Hoffmann
RC Edgar
RR Bouckaert
SF Altschul
Shin-Han Shiu
T Dandekar
T Sato
Y Freund
Y Qi
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Molecular biology is currently facing the challenging task of functionally characterizing the proteome. The large number of possible protein-protein interactions and complexes, the variety of environmental conditions and cellular states in which these interactions can be reorganized, and the multiple ways in which a protein can influence the function of others, requires the development of experimental and computational approaches to analyze and predict functional associations between proteins as part of their activity in the interactome. Methodology/Principal Findings: We have studied the possibility of constructing a classifier in order to combine the output of the several protein interaction prediction methods. The AODE (Averaged One-Dependence Estimators) machine learning algorithm is a suitable choice in this case and it provides better results than the individual prediction methods, and it has better performances than other tested alternative methods in this experimental set up. To illustrate the potential use of this new AODE-based Predictor of Protein InterActions (APPIA), when analyzing high-throughput experimental data, we show how it helps to filter the results of published High-Throughput proteomic studies, ranking in a significant way functionally related pairs. Availability: All the predictions of the individual methods and of the combined APPIA predictor, together with the used datasets of functional associations are available at http://ecid.bioinfo.cnio.es/. Conclusions: We propose a strategy that integrates the main current computational techniques used to predict functional associations into a unified classifier system, specifically focusing on the evaluation of poorly characterized protein pairs. We selected the AODE classifier as the appropriate tool to perform this task. AODE is particularly useful to extract valuable information from large unbalanced and heterogeneous data sets. The combination of the information provided by five prediction interaction prediction methods with some simple sequence features in APPIA is useful in establishing reliability values and helpful to prioritize functional interactions that can be further experimentally characterized.This work was funded by the BioSapiens (grant number LSHG-CT-2003-503265) and the Experimental Network for Functional Integration (ENFIN) Networks of Excellence (contract number LSHG-CT-2005-518254), by Consolider BSC (grant number CSD2007-00050) and by the project “Functions for gene sets” from the Spanish Ministry of Education and Science (BIO2007-66855). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

e-Archivo (Univ. Carlos III de Madrid e-Archivo)