1,071 research outputs found

    Characterisation of data resources for in silico modelling: benchmark datasets for ADME properties.

    Get PDF
    Introduction: The cost of in vivo and in vitro screening of ADME properties of compounds has motivated efforts to develop a range of in silico models. At the heart of the development of any computational model are the data; high quality data are essential for developing robust and accurate models. The characteristics of a dataset, such as its availability, size, format and type of chemical identifiers used, influence the modelability of the data. Areas covered: This review explores the usefulness of publicly available ADME datasets for researchers to use in the development of predictive models. More than 140 ADME datasets were collated from publicly available resources and the modelability of 31selected datasets were assessed using specific criteria derived in this study. Expert opinion: Publicly available datasets differ significantly in information content and presentation. From a modelling perspective, datasets should be of adequate size, available in a user-friendly format with all chemical structures associated with one or more chemical identifiers suitable for automated processing (e.g. CAS number, SMILES string or InChIKey). Recommendations for assessing dataset suitability for modelling and publishing data in an appropriate format are discussed

    Molecular Similarity and Xenobiotic Metabolism

    Get PDF
    MetaPrint2D, a new software tool implementing a data-mining approach for predicting sites of xenobiotic metabolism has been developed. The algorithm is based on a statistical analysis of the occurrences of atom centred circular fingerprints in both substrates and metabolites. This approach has undergone extensive evaluation and been shown to be of comparable accuracy to current best-in-class tools, but is able to make much faster predictions, for the first time enabling chemists to explore the effects of structural modifications on a compound’s metabolism in a highly responsive and interactive manner.MetaPrint2D is able to assign a confidence score to the predictions it generates, based on the availability of relevant data and the degree to which a compound is modelled by the algorithm.In the course of the evaluation of MetaPrint2D a novel metric for assessing the performance of site of metabolism predictions has been introduced. This overcomes the bias introduced by molecule size and the number of sites of metabolism inherent to the most commonly reported metrics used to evaluate site of metabolism predictions.This data mining approach to site of metabolism prediction has been augmented by a set of reaction type definitions to produce MetaPrint2D-React, enabling prediction of the types of transformations a compound is likely to undergo and the metabolites that are formed. This approach has been evaluated against both historical data and metabolic schemes reported in a number of recently published studies. Results suggest that the ability of this method to predict metabolic transformations is highly dependent on the relevance of the training set data to the query compounds.MetaPrint2D has been released as an open source software library, and both MetaPrint2D and MetaPrint2D-React are available for chemists to use through the Unilever Centre for Molecular Science Informatics website.----Boehringer-Ingelhie

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Coarse-grained modeling for molecular discovery:Applications to cardiolipin-selectivity

    Get PDF
    The development of novel materials is pivotal for addressing global challenges such as achieving sustainability, technological progress, and advancements in medical technology. Traditionally, developing or designing new molecules was a resource-intensive endeavor, often reliant on serendipity. Given the vast space of chemically feasible drug-like molecules, estimated between 106 - 10100 compounds, traditional in vitro techniques fall short.Consequently, in silico tools such as virtual screening and molecular modeling have gained increasing recognition. However, the computational cost and the limited precision of the utilized molecular models still limit computational molecular design.This thesis aimed to enhance the molecular design process by integrating multiscale modeling and free energy calculations. Employing a coarse-grained model allowed us to efficiently traverse a significant portion of chemical space and reduce the sampling time required by molecular dynamics simulations. The physics-informed nature of the applied Martini force field and its level of retained structural detail make the model a suitable starting point for the focused learning of molecular properties.We applied our proposed approach to a cardiolipin bilayer, posing a relevant and challenging problem and facilitating reasonable comparison to experimental measurements.We identified promising molecules with defined properties within the resolution limit of a coarse-grained representation. Furthermore, we were able to bridge the gap from in silico predictions to in vitro and in vivo experiments, supporting the validity of the theoretical concept. The findings underscore the potential of multiscale modeling and free-energy calculations in enhancing molecular discovery and design and offer a promising direction for future research

    Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

    Full text link
    By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal organic frameworks (MOFs). At present, we have libraries of over ten thousand synthesized materials and millions of in-silico predicted materials. The fact that we have so many materials opens many exciting avenues to tailor make a material that is optimal for a given application. However, from an experimental and computational point of view we simply have too many materials to screen using brute-force techniques. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We emphasize the importance of data collection, methods to augment small data sets, how to select appropriate training sets. An important part of this review are the different approaches that are used to represent these materials in feature space. The review also includes a general overview of the different ML techniques, but as most applications in porous materials use supervised ML our review is focused on the different approaches for supervised ML. In particular, we review the different method to optimize the ML process and how to quantify the performance of the different methods. In the second part, we review how the different approaches of ML have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. The range of topics illustrates the large variety of topics that can be studied with big-data science. Given the increasing interest of the scientific community in ML, we expect this list to rapidly expand in the coming years.Comment: Editorial changes (typos fixed, minor adjustments to figures

    Los receptores para el reconocimiento de patrones moleculares: aportaciones de la química computacional para el diseño de fármacos y la modulación de la inmunidad innata

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Farmacia, Departamento de Química Orgánica y Farmacéutica, leída el 18/11/2019In this Thesis we have aimed the study of the molecular recognition processes of receptors involved in the innate immunity. More concretely, we have focused in two different types of lectins, Galectins and DC-SIGN, and in Toll-like receptor 4. We have made use of computational techniques, including docking and virtual screening, molecular dynamics simulations, conformational analysis and quantum mechanics calculations. The work has been organized into several chapters that are summarized as follows: Chapter 1 corresponds to the current knowledge and perspectives about receptors related to immunity, in particular: galectins, DC-SIGN, and Toll-like receptor 4, corresponding to the molecular recognition events and modulation by small molecules. Chapter 2 describes the state-of-the-art methods in molecular modeling and computational chemistry applied to the study of molecular recognition processes and drug design...En esta tesis hemos estudiado los procesos reconocimiento molecular de receptores involucrados en la inmunidad innata. Más concretamente, nos hemos centrado en dos tipos diferentes de lectinas, Galectinas y DC-SIGN, y en el receptor Toll-like 4 (TLR4). Hemos utilizado técnicas computacionales, incluyendo docking y cribado virtual, simulaciones de dinámica molecular, análisis conformacional y cálculos de mecánica cuántica. El trabajo se ha organizado en diferentes capítulos que se resumen como sigue: El capítulo 1 corresponde al estado del arte y las perspectivas relacionadas con los estudios de reconocimiento molecular proteína-carbohidrato y diseño de nuevos moduladores con actividad biológica en receptores de la inmunidad, en particular galectinas, DC-SIGN y el receptor Toll-like 4. El capítulo 2 describe el estado actual de los métodos en modelado molecular y química computacional aplicados al estudio de los procesos de reconocimiento molecular y diseño de fármacos...Fac. de FarmaciaTRUEunpu

    Kinetic model construction using chemoinformatics

    Get PDF
    Kinetic models of chemical processes not only provide an alternative to costly experiments; they also have the potential to accelerate the pace of innovation in developing new chemical processes or in improving existing ones. Kinetic models are most powerful when they reflect the underlying chemistry by incorporating elementary pathways between individual molecules. The downside of this high level of detail is that the complexity and size of the models also steadily increase, such that the models eventually become too difficult to be manually constructed. Instead, computers are programmed to automate the construction of these models, and make use of graph theory to translate chemical entities such as molecules and reactions into computer-understandable representations. This work studies the use of automated methods to construct kinetic models. More particularly, the need to account for the three-dimensional arrangement of atoms in molecules and reactions of kinetic models is investigated and illustrated by two case studies. First of all, the thermal rearrangement of two monoterpenoids, cis- and trans-2-pinanol, is studied. A kinetic model that accounts for the differences in reactivity and selectivity of both pinanol diastereomers is proposed. Secondly, a kinetic model for the pyrolysis of the fuel “JP-10” is constructed and highlights the use of state-of-the-art techniques for the automated estimation of thermochemistry of polycyclic molecules. A new code is developed for the automated construction of kinetic models and takes advantage of the advances made in the field of chemo-informatics to tackle fundamental issues of previous approaches. Novel algorithms are developed for three important aspects of automated construction of kinetic models: the estimation of symmetry of molecules and reactions, the incorporation of stereochemistry in kinetic models, and the estimation of thermochemical and kinetic data using scalable structure-property methods. Finally, the application of the code is illustrated by the automated construction of a kinetic model for alkylsulfide pyrolysis

    Quantitative structure fate relationships for multimedia environmental analysis

    Get PDF
    Key physicochemical properties for a wide spectrum of chemical pollutants are unknown. This thesis analyses the prospect of assessing the environmental distribution of chemicals directly from supervised learning algorithms using molecular descriptors, rather than from multimedia environmental models (MEMs) using several physicochemical properties estimated from QSARs. Dimensionless compartmental mass ratios of 468 validation chemicals were compared, in logarithmic units, between: a) SimpleBox 3, a Level III MEM, propagating random property values within statistical distributions of widely recommended QSARs; and, b) Support Vector Regressions (SVRs), acting as Quantitative Structure-Fate Relationships (QSFRs), linking mass ratios to molecular weight and constituent counts (atoms, bonds, functional groups and rings) for training chemicals. Best predictions were obtained for test and validation chemicals optimally found to be within the domain of applicability of the QSFRs, evidenced by low MAE and high q2 values (in air, MAE≤0.54 and q2≥0.92; in water, MAE≤0.27 and q2≥0.92).Las propiedades fisicoquímicas de un gran espectro de contaminantes químicos son desconocidas. Esta tesis analiza la posibilidad de evaluar la distribución ambiental de compuestos utilizando algoritmos de aprendizaje supervisados alimentados con descriptores moleculares, en vez de modelos ambientales multimedia alimentados con propiedades estimadas por QSARs. Se han comparado fracciones másicas adimensionales, en unidades logarítmicas, de 468 compuestos entre: a) SimpleBox 3, un modelo de nivel III, propagando valores aleatorios de propiedades dentro de distribuciones estadísticas de QSARs recomendados; y, b) regresiones de vectores soporte (SVRs) actuando como relaciones cuantitativas de estructura y destino (QSFRs), relacionando fracciones másicas con pesos moleculares y cuentas de constituyentes (átomos, enlaces, grupos funcionales y anillos) para compuestos de entrenamiento. Las mejores predicciones resultaron para compuestos de test y validación correctamente localizados dentro del dominio de aplicabilidad de los QSFRs, evidenciado por valores bajos de MAE y valores altos de q2 (en aire, MAE≤0.54 y q2≥0.92; en agua, MAE≤0.27 y q2≥0.92)

    Revelation of Yin-Yang Balance in Microbial Cell Factories by Data Mining, Flux Modeling, and Metabolic Engineering

    Get PDF
    The long-held assumption of never-ending rapid growth in biotechnology and especially in synthetic biology has been recently questioned, due to lack of substantial return of investment. One of the main reasons for failures in synthetic biology and metabolic engineering is the metabolic burdens that result in resource losses. Metabolic burden is defined as the portion of a host cells resources either energy molecules (e.g., NADH, NADPH and ATP) or carbon building blocks (e.g., amino acids) that is used to maintain the engineered components (e.g., pathways). As a result, the effectiveness of synthetic biology tools heavily dependents on cell capability to carry on the metabolic burden. Although genetic modifications can effectively engineer cells and redirect carbon fluxes toward diverse products, insufficient cell ATP powerhouse is limited to support diverse microbial activities including product synthesis. Here, I employ an ancient Chinese philosophy (Yin-Yang) to describe two contrary forces that are interconnected and interdependent, where Yin represents energy metabolism in the form of ATP, and Yang represents carbon metabolism. To decipher Yin-Yang balance and its implication to microbial cell factories, this dissertation applied metabolic engineering, flux analysis, data mining tools to reveal cell physiological responses under different genetic and environmental conditions. Firstly, a combined approach of FBA and 13C-MFA was employed to investigate several engineered isobutanol-producing strains and examine their carbon and energy metabolism. The result indicated isobutanol overproduction strongly competed for biomass building blocks and thus the addition of nutrients (yeast extract) to support cell growth is essential for high yield of isobutanol. Based on the analysis of isobutanol production, \u27Yin-Yang\u27 theory has been proposed to illustrate the importance of carbon and energy balance in engineered strains. The effects of metabolic burden and respiration efficiency (P/O ratio) on biofuel product were determined by FBA simulation. The discovery of energy cliff explained failures in bioprocess scale-ups. The simulation also predicted that fatty acid production is more sensitive to P/O ratio change than alcohol production. Based on that prediction, fatty acid producing strains have been engineered with the insertion of Vitreoscilla hemoglobin (VHb), to overcome the intracellular energy limitation by improving its oxygen uptake and respiration efficiency. The result confirmed our hypothesis and different level of trade-off between the burden and the benefit from various introduced genetic components. On the other side, a series of computational tools have been developed to accelerate the application of fluxomics research. Microbesflux has been rebuilt, upgraded, and moved to a commercial server. A platform for fluxomics study as well as an open source 13C-MFA tool (WUFlux) has been developed. Further, a computational platform that integrates machine learning, logic programming, and constrained programming together has been developed. This platform gives fast predictions of microbial central metabolism with decent accuracy. Lastly, a framework has been built to integrate Big Data technology and text mining to interpret concepts and technology trends based on the literature survey. Case studies have been performed, and informative results have been obtained through this Big Data framework within five minutes. In summary, 13C-MFA and flux balance analysis are only tools to quantify cell energy and carbon metabolism (i.e., Yin-Yang Balance), leading to the rational design of robust high-producing microbial cell factories. Developing advanced computational tools will facilitate the application of fluxomics research and literature analysis
    corecore