35 research outputs found
Advances in Bioinformatics : contributions to high-throughput proteomics-based identification, quantification and systems biology
Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 27-03-2017El análisis de datos en proteómica de alto rendimiento lleva implícito el reto de extraer significado biológico a partir de un gran número de identificaciones y cuantificaciones. En la última década la tecnología en este campo ha sufrido una transformación sin precedentes; esto ha requerido un desarrollo colosal de las herramientas bioinformáticas para establecer los fundamentos de los algoritmos que se usarán en los próximos años de investigación en proteómica.
En la primera publicación, analizamos en profundidad el rendimiento e influencia de los algoritmos de identificación de péptidos tras la aparición de la espectrometría de masas de alta resolución. Se muestra que, en muchos casos, reducir la tolerancia de la masa del ion precursor para identificar péptidos nos lleva a un aumento en el número de péptidos identificados incorrectamente, subestimados en gran medida por la tasa de error FDR1. Proponemos aquí un cambio en el algoritmo de búsqueda, consistente en el uso de ventana ancha para la masa del precursor, seguida de un filtrado de dicha masa tras calcular la puntuación asignada a la identificación.
La segunda publicación trata del WSPP2, un modelo estadístico que desarrollamos para el análisis de experimentos de proteómica cuantitativa de alto rendimiento. Se puede utilizar en numerosas combinaciones de métodos de marcaje isotópico estable (SIL) y espectrómetros de masa. Además, aporta un marco estadístico general para estos experimentos, permitiendo la comparación de resultados entre distintos laboratorios gracias a su capacidad única para separar las diferentes fuentes de varianza, así como la interpretación de resultados a distintos niveles.
En el tercer y último artículo presentamos un método innovador para el análisis de biología de sistemas en proteómica, aprovechando la base estadística del WSPP, y teniendo en cuenta el grado de coordinación del proteoma. Esto fue posible gracias al desarrollo del Algoritmo de Integración Genérico (GIA), que permitió que la información cuantitativa se pudiera integrar desde cualquier nivel inferior a cualquier nivel superior (en vez de limitarnos a la secuencia estándar espectro-péptido-proteína). Todos estos modelos están implementados en SanXoT, un paquete de software que pone en práctica los modelos mencionados para proteómica cuantitativa.
Estos tres pasos representaron un cambio drástico en los métodos empleados para analizar proteomas en nuestro laboratorio, y abren la puerta a infinidad de posibilidades para futuros desarrollos y mejoras en proteómica de alto rendimiento.The analysis of high-throughput proteomics data presents the challenge of extracting biological meaning from a wealth of protein identifications and quantifications. In the last decade, technology in this area has undergone a major transformation that required a continuous and enormous development of bioinformatic tools to establish the foundations of the algorithms to be used in the next years of proteomics research. In this work we present three papers that represent three milestones in this endeavour.
In the first publication, we present a deep analysis on the performance and influence of the peptide identification search algorithms upon the appearance of high-resolution and high-accuracy mass spectrometres. It is shown that, in many relevant cases, using smaller precursor ion mass tolerances to identify peptides leads to an increased number of incorrectly identified peptides greatly underestimated by the false discovery rate (FDR). Here we propose a change in the search algorithm, consisting of the use of wide mass windows followed by a post-scoring mass filtering.
The second publication is dedicated to the WSPP (initialism for Weighted Spectrum, Peptide, Protein) statistical model for the analysis of high-throughput quantitative proteomics experiments. The model can be used in a wide range of combinations of stable isotope labelling (SIL) techniques and mass spectrometres. Additionally, this algorithm provides a general statistical framework for these experiments, allowing the comparison of results across laboratories, thanks to its unique capacity to separate the different sources of variance, allowing the interpretation of the error at different levels.
In the third and final paper, we present an innovative method to perform systems biology analyses from the proteomics perspective, considering the degree of coordination of a proteome, and thanks to the statistical basis provided by the WSPP statistical model. This was possible after developing the Generic Integration Algorithm (GIA), which allowed integrating quantitative information from any lower level to any higher level (instead of limiting us to the traditional spectrum-peptide-protein workflow). All these models are implemented in SanXoT, a software package developed to allow the practical use of the mentioned models in quantitative proteomics.
These three steps in the research of high-throughput proteomics represented a dramatic change in the way proteomes were analysed in our laboratory, and opened countless possibilities for further development and enhancement of this research topic
Evolutionary analysis of gene ages across TADs associates chromatin topology with whole-genome duplications
Topologically associated domains (TADs) are interaction subnetworks of chromosomal regions in 3D genomes. TAD boundaries frequently coincide with genome breaks while boundary deletion is under negative selection, suggesting that TADs may facilitate genome rearrangements and evolution. We show that genes co-localize by evolutionary age in humans and mice, resulting in TADs having different proportions of younger and older genes. We observe a major transition in the age co-localization patterns between the genes born during vertebrate whole-genome duplications (WGDs) or before and those born afterward. We also find that genes recently duplicated in primates and rodents are more frequently essential when they are located in old-enriched TADs and interact with genes that last duplicated during the WGD. Therefore, the evolutionary relevance of recent genes may increase when located in TADs with established regulatory networks. Our data suggest that TADs could play a role in organizing ancestral functions and evolutionary novelty.</p
Dermatoscopic Features of Naevi During Pregnancy-A Mini Review
Changes in melanocytic naevi and development of new naevi have been reported in pregnant women. The association between pregnancy and melanoma is a controversial topic. We conducted this review to identify the dermatoscopic changes that occur in naevi during pregnancy that could facilitate in distinguishing benign from suspicious lesions. Medline, Scopus, and Embase datasets were reviewed for clinical studies on dermatoscopic characteristics of melanoma and naevus in pregnancy. Six cohort studies with a total of 258 patients with 1,167 skin lesions that were examined fulfilled the conditions to be included in the review. None of the patients developed melanoma. Development of new naevi, when reported, was observed in less than half of the participants. The most frequent observed dermatoscopic change among the studies was the increase in the number of dots. Development of new vessels, hypo- and hyperpigmentations and changes in the pigment network were common described changes. The included studies were heterogeneous not allowing head-to-head comparisons between them. Robust and larger studies of dermatoscopic evaluation of naevi in pregnant women are needed to determine high-risk dermatoscopic characteristics
Enrichment of ATP Binding Proteins Unveils Proteomic Alterations in Human Macrophage Cell Death, Inflammatory Response, and Protein Synthesis after Interaction with Candida albicans
Macrophages are involved in the primary human response to Candida albicans. After pathogen recognition, signaling pathways are activated, leading to the production of cytokines, chemokines, and antimicrobial peptides. ATP binding proteins are crucial for this regulation. Here, a quantitative proteomic and phosphoproteomic approach was carried out for the study of human macrophage ATP-binding proteins after interaction with C. albicans. From a total of 547 nonredundant quantified proteins, 137 were ATP binding proteins and 59 were detected as differentially abundant. From the differentially abundant ATP-binding proteins, 6 were kinases (MAP2K2, SYK, STK3, MAP3K2, NDKA, and SRPK1), most of them involved in signaling pathways. Furthermore, 85 phosphopeptides were quantified. Macrophage proteomic alterations including an increase of protein synthesis with a consistent decrease in proteolysis were observed. Besides, macrophages showed changes in proteins of endosomal trafficking together with mitochondrial proteins, including some involved in the response to oxidative stress. Regarding cell death mechanisms, an increase of antiapoptotic over pro-apoptotic signals is suggested. Furthermore, a high pro-inflammatory response was detected, together with no upregulation of key mi-RNAs involved in the negative feedback of this response. These findings illustrate a strategy to deepen the knowledge of the complex interactions between the host and the clinically important pathogen C. albicans
Quantitative HDL Proteomics Identifies Peroxiredoxin-6 as a Biomarker of Human Abdominal Aortic Aneurysm
High-density lipoproteins (HDLs) are complex protein and lipid assemblies whose composition is known to change in diverse pathological situations. Analysis of the HDL proteome can thus provide insight into the main mechanisms underlying abdominal aortic aneurysm (AAA) and potentially detect novel systemic biomarkers. We performed a multiplexed quantitative proteomics analysis of HDLs isolated from plasma of AAA patients (N = 14) and control study participants (N = 7). Validation was performed by western-blot (HDL), immunohistochemistry (tissue), and ELISA (plasma). HDL from AAA patients showed elevated expression of peroxiredoxin-6 (PRDX6), HLA class I histocompatibility antigen (HLA-I), retinol-binding protein 4, and paraoxonase/arylesterase 1 (PON1), whereas alpha-2 macroglobulin and C4b-binding protein were decreased. The main pathways associated with HDL alterations in AAA were oxidative stress and immune-inflammatory responses. In AAA tissue, PRDX6 colocalized with neutrophils, vascular smooth muscle cells, and lipid oxidation. Moreover, plasma PRDX6 was higher in AAA (N = 47) than in controls (N = 27), reflecting increased systemic oxidative stress. Finally, a positive correlation was recorded between PRDX6 and AAA diameter. The analysis of the HDL proteome demonstrates that redox imbalance is a major mechanism in AAA, identifying the antioxidant PRDX6 as a novel systemic biomarker of AAA.We thank Simon Bartlett for language and scientific editing. This study was supported by the Spanish Ministry of Economy and Competitiveness (MINECO) (SAF2016-80843-R, BIO2012-37926 and BIO2015-67580-P), Fondo de Investigaciones Sanitarias ISCiii-FEDER (PRB2) (IPT13/0001, ProteoRed, Redes RIC RD12/0042/00038 and RD12/0042/0056, Biobancos RD09/0076/00101 and CA12/00371), Centro de Investigacion Biomedica en Red de Diabetes y Enfermedades Metabolicas Asociadas (CIBERDEM), and FRIAT. The CNIC is supported by the Spanish Ministry of Economy and Competitiveness (MINECO) and the Pro-CNIC Foundation, and is a Severo Ochoa Center of Excellence (MINECO award SEV-2015-0505).S
Comprehensive Quantification of the Modified Proteome Reveals Oxidative Heart Damage in Mitochondrial Heteroplasmy
Post-translational modifications hugely increase the functional diversity of proteomes. Recent algorithms based on ultratolerant database searching are forging a path to unbiased analysis of peptide modifications by shotgun mass spectrometry. However, these approaches identify only one-half of the modified forms potentially detectable and do not map the modified residue. Moreover, tools for the quantitative analysis of peptide modifications are currently lacking. Here, we present a suite of algorithms that allows comprehensive identification of detectable modifications, pinpoints the modified residues, and enables their quantitative analysis through an integrated statistical model. These developments were used to characterize the impact of mitochondrial heteroplasmy on the proteome and on the modified peptidome in several tissues from 12-week-old mice. Our results reveal that heteroplasmy mainly affects cardiac tissue, inducing oxidative damage to proteins of the oxidative phosphorylation system, and provide a molecular mechanism explaining the structural and functional alterations produced in heart mitochondria.We thank Simon Bartlett (CNIC) for English editing. This study was supported by competitive grants from the Spanish Ministry of Economy and Competitiveness (MINECO) (BIO2015-67580-P) through the Carlos III Institute of Health-Fondo de Investigacion Sanitaria (PRB2, IPT13/0001-ISCIII-SGEFI/FEDER; ProteoRed), by Fundacion La Marato TV3, and by FP7-PEOPLE-2013-ITN ``Next-Generation Training in Cardiovascular Research and Innovation-Cardionext.'' N.B. is a FP7-PEOPLE-2013-ITN-Cardionext Fellow. The CNIC is supported by the MINECO and the Pro-CNIC Foundation, and is a Severo Ochoa Center of Excellence (MINECO Award SEV-2015-0505).S
Evolutionarily conserved enhancer-associated features within the MYEOV locus suggest a regulatory role for this non-coding DNA region in cancer
The myeloma overexpressed gene (MYEOV) has been proposed to be a proto-oncogene due to high RNA transcript levels found in multiple cancers, including myeloma, breast, lung, pancreas and esophageal cancer. The presence of an open reading frame (ORF) in humans and other primates suggests protein-coding potential. Yet, we still lack evidence of a functional MYEOV protein. It remains undetermined how MYEOV overexpression affects cancerous tissues. In this work, we show that MYEOV has likely originated and may still function as an enhancer, regulating CCND1 and LTO1. Firstly, MYEOV 3′ enhancer activity was confirmed in humans using publicly available ATAC-STARR-seq data, performed on B-cell-derived GM12878 cells. We detected enhancer histone marks H3K4me1 and H3K27ac overlapping MYEOV in multiple healthy human tissues, which include B cells, liver and lung tissue. The analysis of 3D genome datasets revealed chromatin interactions between a MYEOV-3′-putative enhancer and the proto-oncogene CCND1. BLAST searches and multi-sequence alignment results showed that DNA sequence from this human enhancer element is conserved from the amphibians/amniotes divergence, with a 273 bp conserved region also found in all mammals, and even in chickens, where it is consistently located near the corresponding CCND1 orthologues. Furthermore, we observed conservation of an active enhancer state in the MYEOV orthologues of four non-human primates, dogs, rats, and mice. When studying this homologous region in mice, where the ORF of MYEOV is absent, we not only observed an enhancer chromatin state but also found interactions between the mouse enhancer homolog and Ccnd1 using 3D-genome interaction data. This is similar to the interaction observed in humans and, interestingly, coincides with CTCF binding sites in both species. Taken together, this suggests that MYEOV is a primate-specific gene with a de novo ORF that originated at an evolutionarily older enhancer region. This deeply conserved putative enhancer element could regulate CCND1 in both humans and mice, opening the possibility of studying MYEOV regulatory functions in cancer using non-primate animal models
A Novel Systems-Biology Algorithm for the Analysis of Coordinated Protein Responses Using Quantitative Proteomics
The coordinated behavior of proteins is central to systems biology. However, the underlying mechanisms are poorly known and methods to analyze coordination by conventional quantitative proteomics are still lacking. We present the Systems Biology Triangle (SBT), a new algorithm that allows the study of protein coordination by pairwise quantitative proteomics. The Systems Biology Triangle detected statistically significant coordination in diverse biological models of very different nature and subjected to different kinds of perturbations. The Systems Biology Triangle also revealed with unprecedented molecular detail an array of coordinated, early protein responses in vascular smooth muscle cells treated at different times with angiotensin-II. These responses included activation of protein synthesis, folding, turnover, and muscle contraction ¿ consistent with a differentiated phenotype¿as well as the induction of migration and the repression of cell proliferation and secretion. Remarkably, the majority of the altered functional categories were protein complexes, interaction networks, or metabolic pathways. These changes could not be detected by other algorithms widely used by the proteomics community, and the vast majority of proteins involved have not been described before to be regulated by AngII. The unique capabilities of The Systems Biology Triangle to detect functional protein alterations produced by the coordinated action of proteins in pairwise quantitative proteomics experiments make this algorithm an attractive choice for the biological interpretation of results on a routine basis
Epigenomic translocation of H3K4me3 broad domains over oncogenes following hijacking of super-enhancers
Chromosomal translocations are important drivers of hematological malignancies whereby proto-oncogenes are activated by juxtaposition with super-enhancers, often called enhancer hijacking. We analysed the epigenomic consequences of rearrangements between the super-enhancers of the immunoglobulin heavy locus (IGH) and proto-oncogene CCND1 that are common in B cell malignancies. By integrating BLUEPRINT epigenomic data with DNA breakpoint detection, we characterised the normal chromatin landscape of the human IGH locus and its dynamics after pathological genomic rearrangement. We detected an H3K4me3 broad domain (BD) within the IGH locus of healthy B cells that was absent in samples with IGH-CCND1 translocations. The appearance of H3K4me3-BD over CCND1 in the latter was associated with overexpression and extensive chromatin accessibility of its gene body. We observed similar cancer-specific H3K4me3-BDs associated with super-enhancer hijacking of other common oncogenes in B cell (MAF, MYC and FGFR3/NSD2) and in T-cell malignancies (LMO2, TLX3 and TAL1). Our analysis suggests that H3K4me3-BDs can be created by super-enhancers and supports the new concept of epigenomic translocation, where the relocation of H3K4me3-BDs from cell identity genes to oncogenes accompanies the translocation of super-enhancers
Data Analysis for Data Independent Acquisition
Mass spectrometry-based proteomics using soft ionization techniques has been used successfully to identify large numbers of proteins from complex biological samples. However, reproducible quantification across a large number of samples is still highly challenging with commonly used “shotgun proteomics” which uses stochastic sampling of the peptide analytes (data dependent acquisition; DDA) to analyze samples. Recently, data independent acquisition (DIA) methods have been investigated for their potential for reproducible protein quantification, since they deterministically sample all peptide analytes in every single run. This increases reproducibility and sensitivity, reduces the number of missing values and removes stochasticity from the acquisition process. However, one of the major challenges for wider adoption of DIA has been data analysis. In this chapter we will introduce the five most well-known of these techniques, as well as their data analysis methods, classified either as targeted or untargeted; then, we will discuss briefly the meaning of the false discovery rate (FDR) in DIA experiments, to finally close the chapter with a review of the current challenges in this subject.</jats:p
