2,099 research outputs found
Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.
The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included
Exploring Machine Learning for Untargeted Metabolomics Using Molecular Fingerprints
Background
Metabolomics, the study of substrates and products of cellular metabolism, offers valuable insights into an organism's state under specific conditions and has the potential to revolutionise preventive healthcare and pharmaceutical research. However, analysing large metabolomics datasets remains challenging, with available methods relying on limited and incompletely annotated metabolic pathways.
Methods
This study, inspired by well-established methods in drug discovery, employs machine learning on metabolite fingerprints to explore the relationship of their structure with responses in experimental conditions beyond known pathways, shedding light on metabolic processes. It evaluates fingerprinting effectiveness in representing metabolites, addressing challenges like class imbalance, data sparsity, high dimensionality, duplicate structural encoding, and interpretable features. Feature importance analysis is then applied to reveal key chemical configurations affecting classification, identifying related metabolite groups.
Results
The approach is tested on two datasets: one on Ataxia Telangiectasia and another on endothelial cells under low oxygen. Machine learning on molecular fingerprints predicts metabolite responses effectively, and feature importance analysis aligns with known metabolic pathways, unveiling new affected metabolite groups for further study.
Conclusion
In conclusion, the presented approach leverages the strengths of drug discovery to address critical issues in metabolomics research and aims to bridge the gap between these two disciplines. This work lays the foundation for future research in this direction, possibly exploring alternative structural encodings and machine learning models
Recommended from our members
Computational Inferences of Mutations Driving Mesenchymal Differentiation in Glioblastoma
This dissertation reviews the development and implementation of integrative, systems biology methods designed to parse driver mutations from high- throughput array data derived from human patients. The analysis of vast amounts of genomic and genetic data in the context of complex human genetic diseases such as Glioblastoma is a daunting task. Mutations exist by the hundreds, if not thousands, and only an unknown handful will contribute to the disease in a significant way. The goal of this project was to develop novel computational methods to identify candidate mutations from these data that drive the molecular differentiation of glioblastoma into the mesenchymal subtype, the most aggressive, poorest-prognosis tumors associated with glioblastoma
A global role for KLF1 in erythropoiesis revealed by ChIP-seq in primary erythroid cells
KLF1 regulates a diverse suite of genes to direct erythroid cell differentiation from bipotent progenitors. To determine the local cis-regulatory contexts and transcription factor networks in which KLF1 operates, we performed KLF1 ChIP-seq in the mouse. We found at least 945 sites in the genome of E14.5 fetal liver erythroid cells which are occupied by endogenous KLF1. Many of these recovered sites reside in erythroid gene promoters such as Hbb-bl, but the majority are distant to any known gene. Our data suggests KLF1 directly regulates most aspects of terminal erythroid differentiation including production of alpha- and beta-globin protein chains, heme biosynthesis, coordination of proliferation and anti-apoptotic pathways, and construction of the red cell membrane and cytoskeleton by functioning primarily as a transcriptional activator. Additionally, we suggest new mechanisms for KLF1 cooperation with other transcription factors, in particular the erythroid transcription factor GATA1, to maintain homeostasis in the erythroid compartment
Blueprint: descrição da complexidade da regulação metabólica através da reconstrução de modelos metabólicos e regulatórios integrados
Tese de doutoramento em Biomedical EngineeringUm modelo metabólico consegue prever o fenótipo de um organismo. No entanto, estes modelos
podem obter previsões incorretas, pois alguns processos metabólicos são controlados por mecanismos
reguladores. Assim, várias metodologias foram desenvolvidas para melhorar os modelos metabólicos
através da integração de redes regulatórias. Todavia, a reconstrução de modelos regulatórios e metabólicos à escala genómica para diversos organismos apresenta diversos desafios.
Neste trabalho, propõe-se o desenvolvimento de diversas ferramentas para a reconstrução e análise
de modelos metabólicos e regulatórios à escala genómica. Em primeiro lugar, descreve-se o Biological
networks constraint-based In Silico Optimization (BioISO), uma nova ferramenta para auxiliar a curação
manual de modelos metabólicos. O BioISO usa um algoritmo de relação recursiva para orientar as previsões de fenótipo. Assim, esta ferramenta pode reduzir o número de artefatos em modelos metabólicos,
diminuindo a possibilidade de obter erros durante a fase de curação.
Na segunda parte deste trabalho, desenvolveu-se um repositório de redes regulatórias para procariontes que permite suportar a sua integração em modelos metabólicos. O Prokaryotic Transcriptional
Regulatory Network Database (ProTReND) inclui diversas ferramentas para extrair e processar informação regulatória de recursos externos. Esta ferramenta contém um sistema de integração de dados que
converte dados dispersos de regulação em redes regulatórias integradas. Além disso, o ProTReND dispõe
de uma aplicação que permite o acesso total aos dados regulatórios.
Finalmente, desenvolveu-se uma ferramenta computacional no MEWpy para simular e analisar modelos regulatórios e metabólicos. Esta ferramenta permite ler um modelo metabólico e/ou rede regulatória,
em diversos formatos. Esta estrutura consegue construir um modelo regulatório e metabólico integrado
usando as interações regulatórias e as ligações entre genes e proteínas codificadas no modelo metabólico e na rede regulatória. Além disso, esta estrutura suporta vários métodos de previsão de fenótipo
implementados especificamente para a análise de modelos regulatórios-metabólicos.Genome-Scale Metabolic (GEM) models can predict the phenotypic behavior of organisms. However,
these models can lead to incorrect predictions, as certain metabolic processes are controlled by regulatory
mechanisms. Accordingly, many methodologies have been developed to extend the reconstruction and
analysis of GEM models via the integration of Transcriptional Regulatory Network (TRN)s. Nevertheless,
the perspective of reconstructing integrated genome-scale regulatory and metabolic models for diverse
prokaryotes is still an open challenge.
In this work, we propose several tools to assist the reconstruction and analysis of regulatory and
metabolic models. We start by describing BioISO, a novel tool to assist the manual curation of GEM
models. BioISO uses a recursive relation-like algorithm and Flux Balance Analysis (FBA) to evaluate and
guide debugging of in silico phenotype predictions. Hence, this tool can reduce the number of artifacts in
GEM models, decreasing the burdens of model refinement and curation.
A state-of-the-art repository of TRNs for prokaryotes was implemented to support the reconstruction
and integration of TRNs into GEM models. The ProTReND repository comprehends several tools to extract
and process regulatory information available in several resources. More importantly, this repository contains a data integration system to unify the regulatory data into standardized TRNs at the genome scale.
In addition, ProTReND contains a web application with full access to the regulatory data.
Finally, we have developed a new modeling framework to define, simulate and analyze GEnome-scale
Regulatory and Metabolic (GERM) models in MEWpy. The GERM model framework can read a GEM
model, as well as a TRN from different file formats. This framework assembles a GERM model using
the regulatory interactions and Genes-Proteins-Reactions (GPR) rules encoded into the GEM model and
TRN. In addition, this modeling framework supports several methods of phenotype prediction designed
for regulatory-metabolic models.I would like to thank Fundação para a Ciência e Tecnologia for the Ph.D. studentship I was awarded
with (SFRH/BD/139198/2018)
Recommended from our members
Cheminformatics for genome-scale metabolic reconstructions
Genome-scale metabolic reconstructions are an important resource in the study of metabolism. They provide both a system and component level view of the biochemical transformations of metabolites. As more reconstructions have been created it remains a challenge to integrate and reason about their contents. This thesis focuses on the development of computational methods to allow on-demand comparison and alignment of metabolic reconstructions.
A novel method is introduced that utilises chemical structure representations to identify equivalent metabolites between reconstructions. Using a graph theoretic representation allows the identification and reasoning of metabolites that have a non-exact match. A key advantage is that the method uses the contents of reconstructions directly and does not rely on the creation or use of a common reference.
To annotate reconstructions with chemical structure representations an interactive desktop application is introduced. The application assists in the creation and curation of metabolic information using manual, semi-auto\-mated, and automated methods. Chemical structure representations can be retrieved, drawn, or generated to allow precise metabolite annotation.
In processing chemical information, efficient and optimised algorithms are required. Several areas are addressed and implementations have been contributed to the Chemistry Development Kit. Rings are a fundamental property of chemical structures therefore multiple ring definitions and fast algorithms are explored. Conversion and standardisation between structure representations present a challenge. Efficient algorithms to determine aromaticity, assign a Kekulé form, and generate tautomers are detailed.
Many enzymes are selective and specific to stereochemistry. Methods for the identification, depiction, comparison, and description of stereochemistry are described.The project was funded by Unilever, the Biotechnology and Biological Sciences Research Council [BB/I532153/1], and the European Molecular Biology Laboratory
RANDOM WALK APPLIED TO HETEROGENOUS DRUG-TARGET NETWORKS FOR PREDICTING BIOLOGICAL OUTCOMES
Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2016Prediction of unknown drug target interactions from bioassay data is critical not only for the understanding of various interactions but also crucial for the development of new drugs and repurposing of old ones. Conventional methods for prediction of such interactions can be divided into 2D based and 3D based methods. 3D methods are more CPU expensive and require more manual interpretation whereas 2D methods are actually fast methods like machine learning and similarity search which use chemical fingerprints. One of the problems of using traditional machine learning based method to predict drug-target pairs is that it requires a labeled information of true and false interactions. One of the major problems of supervised learning methods is selection on negative samples. Unknown drug target interactions are regarded as false interactions, which may influence the predictive accuracy of the model. To overcome this problem network based methods has become an effective tool in predicting the drug target interactions overcoming the negative sampling problem. In this dissertation study, I will describe traditional machine learning methods and 3D methods of pharmacophore modeling for drug target prediction and will show how these methods work in a drug discovery scenario. I will then introduce a new framework for drug target prediction based on bipartite networks of drug target relations known as Random Walk with Restart (RWR). RWR integrates various networks including drug– drug similarity networks, protein-protein similarity networks and drug- target interaction networks into a heterogeneous network that is capable of predicting novel drug-target relations. I will describe how chemical features for measuring drug-drug similarity do not affect performance in predicting interactions and further show the performance of RWR using an external dataset from ChEMBL database. I will describe about further implementations of RWR approach into multilayered networks consisting of biological data like diseases, tissue based gene expression data, protein- complexes and metabolic pathways to predict associations between human diseases and metabolic pathways which are very crucial in drug discovery. I have further developed a software tool package netpredictor in R (standalone and the web) for unipartite and bipartite networks and implemented network-based predictive algorithms and network properties for drug-target prediction. This package will be described
Gene Expression Response to Stony Coral Tissue Loss Disease Transmission in M. cavernosa and O. faveolata From Florida
Since 2014, corals within Florida’s Coral Reef have been dying at an unprecedented rate due to stony coral tissue loss disease (SCTLD). Here we describe the transcriptomic outcomes of three different SCTLD transmission experiments performed at the Smithsonian Marine Station and Mote Marine Laboratory between 2019 and 2020 on the corals Orbicella faveolata and Montastraea cavernosa. Overall, diseased O. faveolata had 2194 differentially expressed genes (DEGs) compared with healthy colonies, whereas diseased M. cavernosa had 582 DEGs compared with healthy colonies. Many significant DEGs were implicated in immunity, extracellular matrix rearrangement, and apoptosis. These included, but not limited to, peroxidases, collagens, Bax-like, fibrinogen-like, protein tyrosine kinase, and transforming growth factor beta. A gene module was identified that was significantly correlated to disease transmission. This module possessed many apoptosis and immune genes with high module membership indicating that a complex apoptosis and immune response is occurring in corals during SCTLD transmission. Overall, we found that O. faveolata and M. cavernosa exhibit an immune, apoptosis, and tissue rearrangement response to SCTLD. We propose that future studies should focus on examining early time points of infection, before the presence of lesions, to understand the activating mechanisms involved in SCTLD
- …