815 research outputs found

    Metabolomics-based biomarker discovery for bee health monitoring : a proof of concept study concerning nutritional stress in Bombus terrestris

    Get PDF
    Bee pollinators are exposed to multiple natural and anthropogenic stressors. Understanding the effects of a single stressor in the complex environmental context of antagonistic/synergistic interactions is critical to pollinator monitoring and may serve as early warning system before a pollination crisis. This study aimed to methodically improve the diagnosis of bee stressors using a simultaneous untargeted and targeted metabolomics-based approach. Analysis of 84 Bombus terrestris hemolymph samples found 8 metabolites retained as potential biomarkers that showed excellent discrimination for nutritional stress. In parallel, 8 significantly altered metabolites, as revealed by targeted profiling, were also assigned as candidate biomarkers. Furthermore, machine learning algorithms were applied to the above-described two biomarker sets, whereby the untargeted eight components showed the best classification performance with sensitivity and specificity up to 99% and 100%, respectively. Based on pathway and biochemistry analysis, we propose that gluconeogenesis contributed significantly to blood sugar stability in bumblebees maintained on a low carbohydrate diet. Taken together, this study demonstrates that metabolomics-based biomarker discovery holds promising potential for improving bee health monitoring and to identify stressor related to energy intake and other environmental stressors

    A Hybrid Approach for Mining Metabolomic Data

    Get PDF
    International audienceIn this paper, we introduce a hybrid approach for analyzing metabolomic data about the so-called diabetes of type 2. The identi-cation of biomarkers which are witness of the disease is very important and can be guided by data mining methods. The data to be analyzed are massive and complex and are organized around a small set of individuals and a large set of variables (attributes). In this study, we based our experiments on a combination of ecient numerical supervised methods , namely Support Vector Machines (SVM), Random Forests (RF), and ANOVA, and a symbolic non supervised method, namely Formal Concept Analysis (FCA). The data mining strategy is based on ten spe-cic classication processes which are organized around three main operations , ltering, feature selection, and post-processing. The numerical methods are mainly used in ltering and feature selection while FCA is mainly used for visualization and interpretation purposes. The rst results are encouraging and show that the present strategy is well-adapted to the mining of such complex biological data and the identication of potential predictive biomarkers

    Explainable Artificial Intelligence Paves the Way in Precision Diagnostics and Biomarker Discovery for the Subclass of Diabetic Retinopathy in Type 2 Diabetics

    Get PDF
    Diabetic retinopathy (DR), a common ocular microvascular complication of diabetes, contributes significantly to diabetes-related vision loss. This study addresses the imperative need for early diagnosis of DR and precise treatment strategies based on the explainable artificial intelligence (XAI) framework. The study integrated clinical, biochemical, and metabolomic biomarkers associated with the following classes: non-DR (NDR), non-proliferative diabetic retinopathy (NPDR), and proliferative diabetic retinopathy (PDR) in type 2 diabetes (T2D) patients. To create machine learning (ML) models, 10% of the data was divided into validation sets and 90% into discovery sets. The validation dataset was used for hyperparameter optimization and feature selection stages, while the discovery dataset was used to measure the performance of the models. A 10-fold cross-validation technique was used to evaluate the performance of ML models. Biomarker discovery was performed using minimum redundancy maximum relevance (mRMR), Boruta, and explainable boosting machine (EBM). The predictive proposed framework compares the results of eXtreme Gradient Boosting (XGBoost), natural gradient boosting for probabilistic prediction (NGBoost), and EBM models in determining the DR subclass. The hyperparameters of the models were optimized using Bayesian optimization. Combining EBM feature selection with XGBoost, the optimal model achieved (91.25 ± 1.88) % accuracy, (89.33 ± 1.80) % precision, (91.24 ± 1.67) % recall, (89.37 ± 1.52) % F1-Score, and (97.00 ± 0.25) % the area under the ROC curve (AUROC). According to the EBM explanation, the six most important biomarkers in determining the course of DR were tryptophan (Trp), phosphatidylcholine diacyl C42:2 (PC.aa.C42.2), butyrylcarnitine (C4), tyrosine (Tyr), hexadecanoyl carnitine (C16) and total dimethylarginine (DMA). The identified biomarkers may provide a better understanding of the progression of DR, paving the way for more precise and cost-effective diagnostic and treatment strategies

    Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

    Get PDF
    The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included

    Elements About Exploratory, Knowledge-Based, Hybrid, and Explainable Knowledge Discovery

    Get PDF
    International audienceKnowledge Discovery in Databases (KDD) and especially pattern mining can be interpreted along several dimensions, namely data, knowledge, problem-solving and interactivity. These dimensions are not disconnected and have a direct impact on the quality, applicability, and efficiency of KDD. Accordingly, we discuss some objectives of KDD based on these dimensions, namely exploration, knowledge orientation, hybridization, and explanation. The data space and the pattern space can be explored in several ways, depending on specific evaluation functions and heuristics, possibly related to domain knowledge. Furthermore, numerical data are complex and supervised numerical machine learning methods are usually the best candidates for efficiently mining such data. However, the work and output of numerical methods are most of the time hard to understand, while symbolic methods are usually more intelligible. This calls for hybridization, combining numerical and symbolic mining methods to improve the applicability and interpretability of KDD. Moreover, suitable explanations about the operating models and possible subsequent decisions should complete KDD, and this is far from being the case at the moment. For illustrating these dimensions and objectives, we analyze a concrete case about the mining of biological data, where we characterize these dimensions and their connections. We also discuss dimensions and objectives in the framework of Formal Concept Analysis and we draw some perspectives for future research

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Analytical methods based on ion mobility and mass spectrometry for metabolomics

    Get PDF
    Travelling wave ion mobility spectrometry (TWIMS) in combination with ultra-high performance liquid chromatography (UHPLC) and mass spectrometry (MS) has been applied successfully for the untargeted, global metabolic profiling of biofluids such as mouse plasma and saliva. Methods based on UHPLC-MS alone and in combination with ion mobility spectrometry (UHPLC-IM-MS) have been developed and validated for the untargeted metabolite profiling of saliva, obtained non-invasively by passive drool. Three separate metabolic profiling studies have been carried out in conjunction with bioinformatics strategies to identify potential metabolomic biomarker ions that are associated with efficacy of rice bran in colorectal cancer, physiological stress and that have the potential for the diagnosis of asthma. The advantages offered by the utility of ion mobility in UHPLC-MS based metabolic profiling studies, including the increased analytical space, mass spectral clean-up of contaminants such as PEG post-UHPLC-IM-MS analysis, enhancement of the selectivity of targeted metabolites as well as the potential for the identification of metabolites by comparison of ion mobility drift times have been highlighted. Ten potential metabolic biomarker ions of asthma have been identified from the moderate asthmatics from untargeted metabolite profiling of saliva by UHPLC-MS. A predictive model based on partial least squares discriminant analysis (PLS-DA) has been constructed using these ten discriminant ions, which demonstrates good predictive capability for moderate asthmatics and controls. Potential metabolic biomarker ions of physiological stress have been identified through untargeted metabolite profiling analysis of saliva samples collected before and after exercise by UHPLC-IM-MS. Valerolactam has been identified as a potential biomarker of physiological stress from saliva by comparison of retention time, ion mobility drift time and MS/MS spectra with a standard of δ-valerolactam

    Research in Metabolomics via Nuclear Magnetic Resonance Spectroscopy: Data Mining, Biochemistry and Clinical Chemistry

    Get PDF
    Metabolomics entails the comprehensive characterization of the ensemble of endogenous and exogenous metabolites present in a biological specimen. Metabolites represent, at the same time, the downstream output of the genome and the upstream input from various external factors, such as the environment, lifestyle, and diet. Therefore, in the last few years, metabolomic phenotyping has provided unique insights into the fundamental and molecular causes of several physiological and pathophysiological conditions. In parallel, metabolomics has been demonstrating an emerging role in monitoring the influence of different manufacturing procedures on food quality and food safety. In light of the above, this collection includes the latest research from various fields of NMR-based metabolomics applications ranging from biomedicine to data mining and food chemistry

    Development of computational tools for the analysis of 2D-nuclear magnetic resonance data

    Get PDF
    Dissertação de mestrado em BioinformaticsMetabolomics is one of the omics’ sciences that has been gaining a lot of interest due to its potential on correlating an organism’s biochemical activity and its phenotype. The applications of metabolomics are being extended as new techniques reveal new information on metabolic profiles and molecules, thus elucidating biological, chemical and functional knowledge. The main techniques that collect data are based on mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy. The last one has the advantage of analyzing a sample in vivo without damaging it and while its sensitivity is pointed out as a disadvantage, multidimensional NMR delivers a solution to this issue. It adds layers of information, generating new data that requires advanced bioinformatics methods in order to extract biological meaning. Since multidimensional NMR has different approaches within itself, the need to estab lish an integrated framework that allows a researcher to load its data and extract relevant knowledge has become more imperative over the years. Also, establishing common data analysis pipelines on one-dimensional and multidimensional NMR remains a challenge in current scientific research hindering reproducibility across research groups. In recent work from the host group, specmine, an R package for metabolomics and spectral data analysis/mining, has been developed to wrap and deliver key metabolomic methods that allow a researcher to perform a complete analysis. In this dissertation, tools integrated in specmine were developed to read, visualize and analyze two-dimensional (2D) NMR. A new specmine structure was created for this type of data, easing interpretation and data visualization. In terms of visualization a novel approach towards three-dimensional environments enables users to interact with their data allowing peak hovering or identification of rich resonance regions. The selection of which samples to plot, when the user does not specify an input, is based on a signal-to-noise ratio scale which plots samples with opposite signal-to-noise ratios. A method to perform peak detection on 2D NMR based on local maximum search was implemented to obtain a data structure that best benefits from specmine’s functionalities. These include preprocessing, univariate and multivariate analysis as well as machine learning and feature selection methods. The 2D NMR functions were validated using experimental data from two scientific papers, available on metabolomic databases and applying the necessary preprocessing steps to compare spectra and results. These data originated two case studies from different NMR sources, Bruker and Varian, which reinforces specmine’s flexibility. The case studies were carried out using mainly specmine and other packages for specific processing steps, such as, probabilistic quotient normalization. A pipeline to analyze 2D NMR was added to specmine, in a form of a vignette, to provide a guideline for the newly developed functionalities.A metabolómica é uma das ciências ómicas que tem vindo a ganhar muito interesse devido ao seu potencial para correlacionar a atividade bioquímica de um organismo com o seu fenótipo. As aplicações da metabolómica estão em constante crescimento à medida que novas técnicas revelam nova informação sobre perfis metabólicos e moleculares, elucidando conhecimento biológico, químico e funcional. As principais técnicas para recolher este tipo de dados são baseadas em espectrometria de massa e em ressonância magnética nuclear (RMN). Esta última tem a vantagem de analisar uma amostra in vivo sem a danificar e enquanto a sensibilidade da mesma tem sido apontada como uma desvantagem, surge a abordagem de RMN multidimensional melhorando a versão tradicional. Através da medição de outros núcleos adiciona camadas de informação, gerando um novo tipo de dados que requere métodos bioinformáticos avançados para se extrair significado biológico. A existência de várias abordagens para realizar RMN multidimensional leva à crescente necessidade da existência de uma ferramenta que integre este tipo de dados, de forma a permitir ao investigador executar a sua análise de forma eficaz. Adicionalmente, a consolidação de pipelines comuns para analisar dados de RMN uni- e multidimensional permanece um desafio a investigação científica, dificultando a reprodutibilidade de resultados por diferentes grupos de investigação. Em trabalhos recentes do grupo de acolhimento foi desenvolvido um package para o programa R focado na metabolómica e na análise/mineração de dados. Este package, specmine, tem sido melhorado desde o seu desenvolvimento funcionando como uma ferramenta que engloba diferentes métodos permitindo uma análise total a um determinado conjunto de dados. Baseado neste package, mais recentemente foi desenvolvida uma plataforma web integrada, WebSpecmine, com o mesmo propósito que providencia ao utilizador uma interface de utilizador mais fácil e amigável. Nesta dissertação, ferramentas que permitem a leitura, visualização e análise de NMR bidimensional (2D) foram desenvolvidas tendo em conta a sua integração no specmine. Uma nova estrutura foi adicionada ao package, facilitando a interpretação e esquemetazição dos dados. Quanto a visualização, uma abordagem inovadora para ambientes tridimensionais permite ao utilizador interagir com os seus dados através da identificação de regiões espectrais de interesse ou reconhecimento de picos. A visualização de espectros 2D, sem especificação por parte do utilizador, tem por base uma escala de relação sinal/ruído que permite numa primeira instância visualizar as amostras com uma maior e menor diferença entre sinal e ruído. Foi também implementado um método para realizar a deteção de picos em RMN 2D baseado na procura por valores máximos locais. Esta operação tem por objectivo obter uma estrutura de dados simplificada que melhor beneficia das funcionalidades do specmine. Estas incluem operações de pré-processamento, análises uni- e multivariada, métodos de seleção de variáveis e aprendizagem máquina. As funções desenvolvidas para RMN 2D foram validadas com dados experimentais recolhidos de dois artigos científicos, disponíveis em bases de dados de metabolómica e sobre os quais foram aplicados os passos de pré-processamento que permitissem a comparação de resultados. Estes dados originaram dois casos de estudos que abordavam diferentes instrumentos utilizados em RMN, Bruker e Varian, reforçando desta forma a flexibilidade do specmine relativamente as tipologias de dados capazes de serem lidas. Estes casos foram realizados utilizando principalmente o specmine, no entanto, a utilização de packages externos foi necessária para passos de processamento específicos, como por exemplo, a normalização por quociente probabilístico. Uma pipeline para analise de dados RMN 2D foi adicionada ao specmine, sob a forma de vignette, um formato de documentação longa adequado a packages implementados no programa R. Desta forma e proporcionado ao utilizador um conjunto de procedimentos, orientados a utilização correta das funcionalidades implementadas
    corecore