8,089 research outputs found

    Development of an integrated computational platform for metabolomics data analysis and knowledge extraction

    Get PDF
    Dissertação de mestrado em Computing EngineeringIn the last few years, biological and biomedical research has been generating a large amount of quantitative data, given the surge of high-throughput techniques that are able to quantify different types of molecules in the cell. While transcriptomics and proteomics, which measure gene expression and amounts of proteins respectively, are the most mature, metabolomics, the quantification of small compounds, has been emerging in the last years as an advantageous alternative in many applications. As it happens with other omics data, metabolomics brings important challenges regarding the capability of extracting relevant knowledge from typically large amounts of data. To respond to these challenges, an integrated computational platform for metabolomics data analysis and knowledge extraction was created to facilitate the use of several methods of visualization, data analysis and data mining. In the first stage of the project, a state of the art analysis was conducted to assess the existing methods and computational tools in the field and what was missing or was difficult to use for a common user without computational expertise. This step helped to figure out which strategies to adopt and the main functionalities which were important to develop in the software. As a supporting framework, R was chosen given the easiness of creating and documenting data analysis scripts and the possibility of developing new packages adding new functions, while taking advantage of the numerous resources created by the vibrant R community. So, the next step was to develop an R package with an integrated set of functions that would allow to conduct a metabolomics data analysis pipeline, with reduced effort, allowing to explore the data, apply different data analysis methods and visualize their results, in this way supporting the extraction of relevant knowledge from metabolomics data. Regarding data analysis, the package includes functions for data loading from different formats and pre-processing, as well as different methods for univariate and multivariate data analysis, including t-tests, analysis of variance, correlations, principal component analysis and clustering. Also, it includes a large set of methods for machine learning with distinct models for classification and regression, as well as feature selection methods. The package supports the analysis of metabolomics data from infrared, ultra violet visible and nuclear magnetic resonance spectroscopies. The package has been validated on real examples, considering three case studies, including the analysis of data from natural products including bees propolis and cassava, as well as metabolomics data from cancer patients. Each of these data were analyzed using the developed package with different pipelines of analysis and HTML reports that include both analysis scripts and their results, were generated using the documentation features provided by the package.Nos últimos anos, a investigação biológica e biomédica tem gerado um grande número de dados quantitativos, devido ao aparecimento de técnicas de alta capacidade que permitem quantificar diferentes tipos de moléculas na célula. Enquanto a transcriptómica e a proteómica, que medem a expressão genética e quantidade de proteínas respectivamente, estão mais desenvolvidas, a metabolómica, que tem por definição a quantificação de pequenos compostos, tem emergido nestes últimos anos como uma alternativa vantajosa em muitas aplicações. Como acontece com outros dados ómicos, a metabolómica traz importantes desafios em relação à capacidade de extracção de conhecimento relevante de uma grande quantidade de dados tipicamente. Para responder a esses desafios, uma plataforma computacional integrada para a análise de dados de metabolómica e extracção de informação foi criada para facilitar o uso de diversos métodos de visualização, análise de dados e mineração de dados. Na primeira fase do projecto, foi efectuado um levantamento do estado da arte para avaliar os métodos e ferramentas computacionais existentes na área e o que estava em falta ou difícil de usar para um utilizador comum sem conhecimentos de informática. Esta fase ajudou a esclarecer que estratégias adoptar e as principais funcionalidades que fossem importantes para desenvolver no software. Como uma plataforma de apoio, o R foi escolhido pela sua facilidade de criação e documentar scripts de análise de dados e a possibilidade de novos pacotes adicionarem novas funcionalidades, enquanto se tira vantagem dos inúmeros recursos criados pela vibrante comunidade do R. Assim, o próximo passo foi o desenvolvimento do pacote do R com um conjunto integrado de funções que permitem conduzir um pipeline de análise de dados, com reduzido esforço, permitindo explorar os dados, aplicar diferentes métodos de análise de dados e visualizar os seus resultados, desta maneira suportando a extracção de conhecimento relevante de dados de metabolómica. Em relação à análise de dados, o pacote inclui funções para o carregamento dos dados de diversos formatos e para pré-processamento, assim como diferentes métodos para a análise univariada e multivariada dos dados, incluindo t-tests, análise de variância, correlações, análise de componentes principais e agrupamentos. Também inclui um grande conjunto de métodos para aprendizagem automática com modelos distintos para classificação ou regressão, assim como métodos de selecção de atributos. Este pacote suporta a análise de dados de metabolómica de espectroscopia de infravermelhos, ultra violeta visível e ressonância nuclear magnética. O pacote foi validado com exemplos reais, considerando três casos de estudo, incluindo a análise dos dados de produtos naturais como a própolis e a mandioca, assim como dados de metabolómica de pacientes com cancro. Cada um desses dados foi analisado usando o pacote desenvolvido com diferentes pipelines de análise e relatórios HTML que incluem ambos scripts de análise e os seus resultados, foram gerados usando as funcionalidades documentadas fornecidas pelo pacote

    Updates in metabolomics tools and resources: 2014-2015

    Get PDF
    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

    Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing

    Get PDF
    Background A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility. The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases. Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work. To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike. Aim of Review To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science. Key Scientific Concepts of Review This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform

    Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

    Get PDF
    The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included

    Toward a Standardized Strategy of Clinical Metabolomics for the Advancement of Precision Medicine

    Get PDF
    Despite the tremendous success, pitfalls have been observed in every step of a clinical metabolomics workflow, which impedes the internal validity of the study. Furthermore, the demand for logistics, instrumentations, and computational resources for metabolic phenotyping studies has far exceeded our expectations. In this conceptual review, we will cover inclusive barriers of a metabolomics-based clinical study and suggest potential solutions in the hope of enhancing study robustness, usability, and transferability. The importance of quality assurance and quality control procedures is discussed, followed by a practical rule containing five phases, including two additional "pre-pre-" and "post-post-" analytical steps. Besides, we will elucidate the potential involvement of machine learning and demonstrate that the need for automated data mining algorithms to improve the quality of future research is undeniable. Consequently, we propose a comprehensive metabolomics framework, along with an appropriate checklist refined from current guidelines and our previously published assessment, in the attempt to accurately translate achievements in metabolomics into clinical and epidemiological research. Furthermore, the integration of multifaceted multi-omics approaches with metabolomics as the pillar member is in urgent need. When combining with other social or nutritional factors, we can gather complete omics profiles for a particular disease. Our discussion reflects the current obstacles and potential solutions toward the progressing trend of utilizing metabolomics in clinical research to create the next-generation healthcare system.11Ysciescopu

    Metabolomics methods for the synthetic biology of secondary metabolism

    Get PDF
    Many microbial secondary metabolites are of high biotechnological value for medicine, agriculture, and the food industry. Bacterial genome mining has revealed numerous novel secondary metabolite biosynthetic gene clusters, which encode the potential to synthesize a large diversity of compounds that have never been observed before. The stimulation or “awakening” of this cryptic microbial secondary metabolism has naturally attracted the attention of synthetic microbiologists, who exploit recent advances in DNA sequencing and synthesis to achieve unprecedented control over metabolic pathways. One of the indispensable tools in the synthetic biology toolbox is metabolomics, the global quantification of small biomolecules. This review illustrates the pivotal role of metabolomics for the synthetic microbiology of secondary metabolism, including its crucial role in novel compound discovery in microbes, the examination of side products of engineered metabolic pathways, as well as the identification of major bottlenecks for the overproduction of compounds of interest, especially in combination with metabolic modeling. We conclude by highlighting remaining challenges and recent technological advances that will drive metabolomics towards fulfilling its potential as a cornerstone technology of synthetic microbiology

    Editorial overview: recent innovations in the metabolomics revolution

    Get PDF
    No abstract available

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
    corecore