18 research outputs found

    Advancing computational methods for mass spectrometry-based proteomics, metabolomics, and analysis of multi-omics datasets

    Get PDF
    Undoubtedly, the current century is witness to an unprecedented speed in advancements within biological sciences, which are owed to the immense technological progress in the analytical tools and methods utilized, and to the dawn of the fast developing fields of omics and bioinformatics. Omics allows the collection of holistic data on several different biomolecule classes, and bioinformatics makes it possible to explore and understand the vast amounts of data produced. The most mature omics fields, in terms of both hardware and software, are genomics and transcriptomics, based on next generation sequencing (NGS) technologies. With the introduction of electrospray ionization and high-resolution mass spectrometry, liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS), has made significant leaps for the fields of metabolomics and proteomics. One promising method for LC-MS/MS-based proteomics is data independent acquisition (DIA), which requires advanced data analysis algorithms. MaxDIA, within the MaxQuant software for the processing of LC-MS/MS-based proteomics data, is introduced here. It comes with an accurate false discovery rate estimation of the peptide and protein identification based on measured and predicted spectrum libraries. When compared to the state of the art, MaxDIA also delivers comprehensive proteome coverages and lower coefficients of variation in protein quantification. Bioinformatics tools for the analysis of metabolomics data generally follow the same principles and steps as proteomics software, but due the huge numbers of metabolites and immense complexity of metabolomics data, much work is still needed to bring metabolomics software to the level of maturity of their proteomics equivalents. MaxQuant is a time tested and widely accepted software for the processing of proteomics data, which was first recognized for its cutting-edge nonlinear recalibration for reaching superior precursor mass accuracy, which helps significantly improve peptide identifications. Here, following this direction, a new algorithm within MaxQuant for improving mass accuracy in metabolomics data is introduced, which utilizes a novel metabolite library-based mass recalibration algorithm. The many types of omics data available today present a great opportunity for developing approaches to combine such data in order to infer new knowledge, often termed multi-omics studies. A robust approach to this end is to utilize prior knowledge on the relationships of the various major biomolecules in question, which are often depicted in network structures where the nodes of the network depict biomolecules and the edges correspond to an interaction. To implement this approach, Metis is introduced, a new plugin for the Perseus software aimed at analyzing quantitative multi-omics data based on metabolic pathways. This thesis includes four publications, the first of which is a review article on computational metabolomics as a part of the introduction, listed below: 1. Hamzeiy, Hamid, and Jürgen Cox. 2017. “What Computational Non-Targeted Mass Spectrometry-Based Metabolomics Can Gain from Shotgun Proteomics.” Current Opinion in Biotechnology 43: 141–46. https://doi.org/10.1016/j.copbio.2016.11.014. 2. Sinitcyn, Pavel, Shivani Tiwary, Jan Rudolph, Petra Gutenbrunner, Christoph Wichmann, Şule Yllmaz, Hamid Hamzeiy, Favio Salinas, and Jürgen Cox. 2018. “MaxQuant Goes Linux.” Nature Methods 15 (6): 401. https://doi.org/10.1038/s41592-018-0018-y. 3. Pavel Sinitcyn, Hamid Hamzeiy, Favio Salinas Soto, Daniel Itzhak, Frank McCarthy, Christoph Wichmann, Martin Steger, Uli Ohmayer, Ute Distler, Stephanie Kaspar-Schoenefeld, Nikita Prianichnikov, Şule Yılmaz, Jan Daniel Rudolph, Stefan Tenzer, Yasset Perez-Riverol, Nagarjuna Nagaraj, Sean J. Humphrey and Jürgen Cox. “MaxDIA enables highly sensitive and accurate library-based and library-free data-independent acquisition proteomics.” Submitted to Nature Biotechnology, 2020 4. Hamid Hamzeiy, Daniela Ferretti, Maria S. Robles, and Jürgen Cox. “Perseus plugin ‘Metis’ for metabolic pathway-centered quantitative multi-omics data analysis supporting static and time-series experimental designs.” Submitted to Cell Systems, 202

    Perseus plugin “Metis” for metabolic-pathway-centered quantitative multi-omics data analysis for static and time-series experimental designs

    Get PDF
    We introduce Metis, a new plugin for the Perseus software aimed at analyzing quantitative multi-omics data based on metabolic pathways. Data from different omics types are connected through reactions of a genome-scale metabolic-pathway reconstruction. Metabolite concentrations connect through the reactants, while transcript, protein, and protein post-translational modification (PTM) data are associated through the enzymes catalyzing the reactions. Supported experimental designs include static comparative studies and time-series data. As an example for the latter, we combine circadian mouse liver multi-omics data and study the contribution of cycles of phosphoproteome and metabolome to enzyme activity regulation. Our analysis resulted in 52 pairs of cycling phosphosites and metabolites connected through a reaction. The time lags between phosphorylation and metabolite peak show non-uniform behavior, indicating a major contribution of phosphorylation in the modulation of enzymatic activity.publishedVersio

    MaxDIA enables library-based and library-free data-independent acquisition proteomics

    Get PDF
    MaxDIA is a software platform for analyzing data-independent acquisition (DIA) proteomics data within the MaxQuant software environment. Using spectral libraries, MaxDIA achieves deep proteome coverage with substantially better coefficients of variation in protein quantification than other software. MaxDIA is equipped with accurate false discovery rate (FDR) estimates on both library-to-DIA match and protein levels, including when using whole-proteome predicted spectral libraries. This is the foundation of discovery DIA—hypothesis-free analysis of DIA samples without library and with reliable FDR control. MaxDIA performs three- or four-dimensional feature detection of fragment data, and scoring of matches is augmented by machine learning on the features of an identification. MaxDIA’s bootstrap DIA workflow performs multiple rounds of matching with increasing quality of recalibration and stringency of matching to the library. Combining MaxDIA with two new technologies—BoxCar acquisition and trapped ion mobility spectrometry—both lead to deep and accurate proteome quantification.publishedVersio

    Can MiRBase Provide Positive Data for Machine Learning for the Detection of MiRNA Hairpins?

    No full text
    Experimental detection and validation of miRNAs is a tedious, time-consuming, and expensive process. Computational methods for miRNA gene detection are being developed so that the number of candidates that need experimental validation can be reduced to a manageable amount. Computational methods involve homology-based and ab inito algorithms. Both approaches are dependent on positive and negative training examples. Positive examples are usually derived from miRBase, the main resource for experimentally validated miRNAs. We encountered some problems with miRBase which we would like to report here. Some problems, among others, we encountered are that folds presented in miRBase are not always the fold with the minimum free energy; some entries do not seem to conform to expectations of miRNAs, and some external accession numbers are not valid. In addition, we compared the prediction accuracy for the same negative dataset when the positive data came from miRBase or miRTarBase and found that the latter led to more precise prediction models. We suggest that miRBase should introduce some automated facilities for ensuring data quality to overcome these problems

    Can MiRBase provide positive data for machine learning for the detection of MiRNA hairpins?

    Get PDF
    Experimental detection and validation of miRNAs is a tedious, time-consuming, and expensive process. Computational methods for miRNA gene detection are being developed so that the number of candidates that need experimental validation can be reduced to a manageable amount. Computational methods involve homology-based and ab inito algorithms. Both approaches are dependent on positive and negative training examples. Positive examples are usually derived from miRBase, the main resource for experimentally validated miRNAs. We encountered some problems with miRBase which we would like to report here. Some problems, among others, we encountered are that folds presented in miRBase are not always the fold with the minimum free energy; some entries do not seem to conform to expectations of miRNAs, and some external accession numbers are not valid. In addition, we compared the prediction accuracy for the same negative dataset when the positive data came from miRBase or miRTarBase and found that the latter led to more precise prediction models. We suggest that miRBase should introduce some automated facilities for ensuring data quality to overcome these problems

    Visualization and Analysis of MicroRNAs within KEGG Pathways using VANESA

    No full text
    MicroRNAs (miRNAs) are small RNA molecules which are known to take part in post-transcriptional regulation of gene expression. Here, VANESA, an existing platform for reconstructing, visualizing, and analysis of large biological networks, has been further expanded to include all experimentally validated human miRNAs available within miRBase, TarBase and miRTarBase. This is done by integrating a custom hybrid miRNA database to DAWIS-M.D., VANESA’s main data source, enabling the visualization and analysis of miRNAs within large biological pathways such as those found within the Kyoto Encyclopedia of Genes and Genomes (KEGG). Interestingly, 99.15 % of human KEGG pathways either contain genes which are targeted by miRNAs or harbor them. This is mainly due to the high number of interaction partners that each miRNA could have (e.g.: hsa-miR-335-5p targets 2544 genes and 71 miRNAs target NUFIP2). We demonstrate the usability of our system by analyzing the measles virus KEGG pathway as a proof-of-principle model and further highlight the importance of integrating miRNAs (both experimentally validated and predicted) into biological networks for the elucidation of novel miRNA-mRNA interactions of biological importance

    Visualization and Analysis of miRNAs Implicated in Amyotrophic Lateral Sclerosis Within Gene Regulatory Pathways

    No full text
    Hamzeiy H, Suluyayla R, Brinkrolf C, Janowski SJ, Hofestädt R, Allmer J. Visualization and Analysis of miRNAs Implicated in Amyotrophic Lateral Sclerosis Within Gene Regulatory Pathways. Studies in Health Technology and Informatics. 2018;253:183-187.MicroRNAs (miRNAs), approximately 22 nucleotides long, post-transcriptionally active gene expression regulators, play active roles in modulating cellular processes. Gene regulation and miRNA regulation are intertwined and the main aim of this study is to facilitate the analysis of miRNAs within gene regulatory pathways. VANESA enables the reconstruction of biological pathways and supports visualization and simulation. To support integrative miRNA and gene pathway analyses, a custom database of experimentally proven miRNAs, integrating data from miRBase, TarBase and miRTarBase, was added to DAWIS-M.D., which is the main data source for VANESA. Analysis of human KEGG pathways within DAWIS-M.D. showed that 661 miRNAs (~1/3 recorded human miRNAs) lead to 65,474 interactions. hsa-miR-335-5p targets most genes in our system (2,544); while the most targeted gene (with 71 miRNAs) is NUFIP2 (Nuclear Fragile X Mental Retardation Protein Interacting Protein 2). Amyotrophic Lateral Sclerosis (ALS), a complex neurodegenerative disease, was chosen as a proof of concept model. Using our system, it was possible to reduce the initially several hundred genes and miRNAs associated with ALS to eight genes, 19 miRNAs and 31 interactions. This highlights the effectiveness of the implemented system to distill important information from otherwise hard to access, highly convoluted and vast regulatory networks

    Visualization and Analysis of MicroRNAs within KEGG Pathways using VANESA

    No full text
    Hamzeiy H, Suluyayla R, Brinkrolf C, Janowski SJ, Hofestädt R, Allmer J. Visualization and Analysis of MicroRNAs within KEGG Pathways using VANESA. JOURNAL OF INTEGRATIVE BIOINFORMATICS. 2017;14(1).MicroRNAs (miRNAs) are small RNA molecules which are known to take part in post-transcriptional regulation of gene expression. Here, VANESA, an existing platform for reconstructing, visualizing, and analysis of large biological networks, has been further expanded to include all experimentally validated human miRNAs available within miRBase, TarBase and miRTarBase. This is done by integrating a custom hybrid miRNA database to DAWIS-M.D., VANESA's main data source, enabling the visualization and analysis of miRNAs within large biological pathways such as those found within the Kyoto Encyclopedia of Genes and Genomes (KEGG). Interestingly, 99.15 % of human KEGG pathways either contain genes which are targeted by miRNAs or harbor them. This is mainly due to the high number of interaction partners that each miRNA could have (e.g.: hsa-miR-335-5p targets 2544 genes and 71 miRNAs target NUFIP2). We demonstrate the usability of our system by analyzing the measles virus KEGG pathway as a proof-of-principle model and further highlight the importance of integrating miRNAs (both experimentally validated and predicted) into biological networks for the elucidation of novel miRNA-mRNA interactions of biological importance

    Search for SCA2 blood RNA biomarkers highlights Ataxin-2 as strong modifier of the mitochondrial factor PINK1 levels

    No full text
    Ataxin-2 (ATXN2) polyglutamine domain expansions of large size result in an autosomal dominantly inherited multi-system-atrophy of the nervous system named spinocerebellar ataxia type 2 (SCA2), while expansions of intermediate size act as polygenic risk factors for motor neuron disease (ALS and FTLD) and perhaps also for Levodopa-responsive Parkinson's disease (PD)
    corecore