2,386 research outputs found

    Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

    Get PDF
    The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included

    Updates in metabolomics tools and resources: 2014-2015

    Get PDF
    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

    DiviK: Divisive intelligent K-means for hands-free unsupervised clustering in biological big data

    Full text link
    Investigation of molecular heterogeneity provides insights about tumor origin and metabolomics. Increasing amount of data gathered makes manual analyses infeasible. Automated unsupervised learning approaches are exercised for this purpose. However, this kind of analysis requires a lot of experience with setting its hyperparameters and usually an upfront knowledge about the number of expected substructures. Moreover, numerous measured molecules require additional step of feature engineering to provide valuable results. In this work we propose DiviK: a scalable auto-tuning algorithm for segmentation of high-dimensional datasets, and a method to assess the quality of the unsupervised analysis. DiviK is validated on two separate high-throughput datasets acquired by Mass Spectrometry Imaging in 2D and 3D. Proposed algorithm could be one of the default choices to consider during initial exploration of Mass Spectrometry Imaging data. With comparable clustering quality, it brings the possibility of focusing on different levels of dataset nuance, while requires no number of expected structures specified upfront. Finally, due to its simplicity, DiviK is easily generalizable to even more flexible framework, with other clustering algorithm used instead of k-means. Generic implementation is freely available under Apache 2.0 license at https://github.com/gmrukwa/divik.Comment: 8 pages, 7 figure

    Lipidomic Analysis of Glioblastoma Multiforme Using Mass Spectrometry

    Get PDF
    Glioblastoma multiforme (GBM) is the most common and malignant form of primary brain tumors. It is highly invasive and current treatment options have not improved the survival rate over the past twenty years. Novel approaches and technologies from systems biology have the potential to identify biomarkers that could serve as new therapeutic targets for GBM. This study employed lipid profiling technology to investigate lipid biomarkers in ectopic and orthotopic human GBM xenograft models. Primary patient cell lines, GBM10 and GBM43, were injected into the flank and the right cerebral hemisphere of NOD/SCID mice. Tumors were harvested from the brain and flank and proteins, metabolites, and lipids extracted from each sample. Reverse phase based high performance liquid chromatography coupled with Fourier transform ion cyclotron resonance mass spectrometry (LC-FTMS) was used to analyze the lipid profiles of tumor samples. Statistical and clustering analyses were performed to detect differences. Over 500 lipids were identified in each tumor model and lipids with the greatest fold effect in the comparison of ectopic versus orthotopic tumor models fell predominantly into four main classes of lipids: glycosphingolipids, glycerophoshpoethanolamines, triradylglycerols, and glycerophosphoserines. Lipidomic analysis revealed differences in glycosphingolipid and triglyceride profiles when the same tumor was propagated in the flank versus the brain. These results underscore the importance of the surrounding physiological environment on tumor development and are consistent with the hypothesis that specific classes of lipids are critical for GBM tumor growth in different anatomical sites

    BASIS: High-performance bioinformatics platform for processing of large-scale mass spectrometry imaging data in chemically augmented histology

    Get PDF
    Mass Spectrometry Imaging (MSI) holds significant promise in augmenting digital histopathologic analysis by generating highly robust big data about the metabolic, lipidomic and proteomic molecular content of the samples. In the process, a vast quantity of unrefined data, that can amount to several hundred gigabytes per tissue section, is produced. Managing, analysing and interpreting this data is a significant challenge and represents a major barrier to the translational application of MSI. Existing data analysis solutions for MSI rely on a set of heterogeneous bioinformatics packages that are not scalable for the reproducible processing of large-scale (hundreds to thousands) biological sample sets. Here, we present a computational platform (pyBASIS) capable of optimized and scalable processing of MSI data for improved information recovery and comparative analysis across tissue specimens using machine learning and related pattern recognition approaches. The proposed solution also provides a means of seamlessly integrating experimental laboratory data with downstream bioinformatics interpretation/analyses, resulting in a truly integrated system for translational MSI

    Information processing for mass spectrometry imaging

    Get PDF
    Mass Spectrometry Imaging (MSI) is a sensitive analytical tool for detecting and spatially localising thousands of ions generated across intact tissue samples. The datasets produced by MSI are large both in the number of measurements collected and the total data volume, which effectively prohibits manual analysis and interpretation. However, these datasets can provide insights into tissue composition and variation, and can help identify markers of health and disease, so the development of computational methods are required to aid their interpretation. To address the challenges of high dimensional data, randomised methods were explored for making data analysis tractable and were found to provide a powerful set of tools for applying automated analysis to MSI datasets. Random projections provided over 90% dimensionality reduction of MALDI MSI datasets, making them amenable to visualisation by image segmentation. Randomised basis construction was investigated for dimensionality reduction and data compression. Automated data analysis was developed that could be applied data compressed to 1% of its original size, including segmentation and factorisation, providing a direct route to the analysis and interpretation of MSI datasets. Evaluation of these methods alongside established dimensionality reduction pipelines on simulated and real-world datasets showed they could reproducibly extract the chemo-spatial patterns present

    Lipidomic Analysis of Glioblastoma Multiforme Using Mass Spectrometry

    Get PDF
    Glioblastoma multiforme (GBM) is the most common and malignant form of primary brain tumors. It is highly invasive and current treatment options have not improved the survival rate over the past twenty years. Novel approaches and technologies from systems biology have the potential to identify biomarkers that could serve as new therapeutic targets for GBM. This study employed lipid profiling technology to investigate lipid biomarkers in ectopic and orthotopic human GBM xenograft models. Primary patient cell lines, GBM10 and GBM43, were injected into the flank and the right cerebral hemisphere of NOD/SCID mice. Tumors were harvested from the brain and flank and proteins, metabolites, and lipids extracted from each sample. Reverse phase based high performance liquid chromatography coupled with Fourier transform ion cyclotron resonance mass spectrometry (LC-FTMS) was used to analyze the lipid profiles of tumor samples. Statistical and clustering analyses were performed to detect differences. Over 500 lipids were identified in each tumor model and lipids with the greatest fold effect in the comparison of ectopic versus orthotopic tumor models fell predominantly into four main classes of lipids: glycosphingolipids, glycerophoshpoethanolamines, triradylglycerols, and glycerophosphoserines. Lipidomic analysis revealed differences in glycosphingolipid and triglyceride profiles when the same tumor was propagated in the flank versus the brain. These results underscore the importance of the surrounding physiological environment on tumor development and are consistent with the hypothesis that specific classes of lipids are critical for GBM tumor growth in different anatomical sites

    Multi-Class Cancer Subtyping in Salivary Gland Carcinomas with MALDI Imaging and Deep Learning

    Get PDF
    Simple Summary The correct diagnosis of different salivary gland carcinomas is important for a prognosis. This diagnosis is imprecise if it is based only on clinical symptoms and histological methods. Mass spectrometry imaging can provide information about the molecular composition of sample tissues. Using a deep-learning method, we analyzed the mass spectrometry imaging data of 25 patients. Using this workflow we could accurately predict the tumor type in each patient sample. Abstract Salivary gland carcinomas (SGC) are a heterogeneous group of tumors. The prognosis varies strongly according to its type, and even the distinction between benign and malign tumor is challenging. Adenoid cystic carcinoma (AdCy) is one subgroup of SGCs that is prone to late metastasis. This makes accurate tumor subtyping an important task. Matrix-assisted laser desorption/ionization (MALDI) imaging is a label-free technique capable of providing spatially resolved information about the abundance of biomolecules according to their mass-to-charge ratio. We analyzed tissue micro arrays (TMAs) of 25 patients (including six different SGC subtypes and a healthy control group of six patients) with high mass resolution MALDI imaging using a 12-Tesla magnetic resonance mass spectrometer. The high mass resolution allowed us to accurately detect single masses, with strong contributions to each class prediction. To address the added complexity created by the high mass resolution and multiple classes, we propose a deep-learning model. We showed that our deep-learning model provides a per-class classification accuracy of greater than 80% with little preprocessing. Based on this classification, we employed methods of explainable artificial intelligence (AI) to gain further insights into the spectrometric features of AdCys

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
    • …
    corecore