568 research outputs found

    Methods for the Analysis of Matched Molecular Pairs and Chemical Space Representations

    Get PDF
    Compound optimization is a complex process where different properties are optimized to increase the biological activity and therapeutic effects of a molecule. Frequently, the structure of molecules is modified in order to improve their property values. Therefore, computational analysis of the effects of structure modifications on property values is of great importance for the drug discovery process. It is also essential to analyze chemical space, i.e., the set of all chemically feasible molecules, in order to find subsets of molecules that display favorable property values. This thesis aims to expand the computational repertoire to analyze the effect of structure alterations and visualize chemical space. Matched molecular pairs are defined as pairs of compounds that share a large common substructure and only differ by a small chemical transformation. They have been frequently used to study property changes caused by structure modifications. These analyses are expanded in this thesis by studying the effect of chemical transformations on the ionization state and ligand efficiency, both measures of great importance in drug design. Additionally, novel matched molecular pairs based on retrosynthetic rules are developed to increase their utility for prospective use of chemical transformations in compound optimization. Further, new methods based on matched molecular pairs are described to obtain preliminary SAR information of screening hit compounds and predict the potency change caused by a chemical transformation. Visualizations of chemical space are introduced to aid compound optimization efforts. First, principal component plots are used to rationalize a matched molecular pair based multi-objective compound optimization procedure. Then, star coordinate and parallel coordinate plots are introduced to analyze drug-like subspaces, where compounds with favorable property values can be found. Finally, a novel network-based visualization of high-dimensional property space is developed. Concluding, the applications developed in this thesis expand the methodological spectrum of computer-aided compound optimization

    Multi-faceted Structure-Activity Relationship Analysis Using Graphical Representations

    Get PDF
    A core focus in medicinal chemistry is the interpretation of structure-activity relationships (SARs) of small molecules. SAR analysis is typically carried out on a case-by-case basis for compound sets that share activity against a given target. Although SAR investigations are not a priori dependent on computational approaches, limitations imposed by steady rise in activity information have necessitated the use of such methodologies. Moreover, understanding SARs in multi-target space is extremely difficult. Conceptually different computational approaches are reported in this thesis for graphical SAR analysis in single- as well as multi-target space. Activity landscape models are often used to describe the underlying SAR characteristics of compound sets. Theoretical activity landscapes that are reminiscent of topological maps intuitively represent distributions of pair-wise similarity and potency difference information as three-dimensional surfaces. These models provide easy access to identification of various SAR features. Therefore, such landscapes for actual data sets are generated and compared with graph-based representations. Existing graphical data structures are adapted to include mechanism of action information for receptor ligands to facilitate simultaneous SAR and mechanism-related analyses with the objective of identifying structural modifications responsible for switching molecular mechanisms of action. Typically, SAR analysis focuses on systematic pair-wise relationships of compound similarity and potency differences. Therefore, an approach is reported to calculate SAR feature probabilities on the basis of these pair-wise relationships for individual compounds in a ligand set. The consequent expansion of feature categories improves the analysis of local SAR environments. Graphical representations are designed to avoid a dependence on preconceived SAR models. Such representations are suitable for systematic large-scale SAR exploration. Methods for the navigation of SARs in multi-target space using simple and interpretable data structures are introduced. In summary, multi-faceted SAR analysis aided by computational means forms the primary objective of this dissertation

    Efficient Learning Machines

    Get PDF
    Computer scienc

    Machine Learning Approaches for Improving Prediction Performance of Structure-Activity Relationship Models

    Get PDF
    In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies. First, to improve the prediction accuracy of learning from imbalanced data, Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms combined with bagging as an ensemble strategy was evaluated. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that this method significantly outperformed other conventional methods. SMOTEENN with bagging became less effective when IR exceeded a certain threshold (e.g., \u3e40). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p \u3c 0.001, ANOVA) by 22-27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Lastly, current features used for QSAR based machine learning are often very sparse and limited by the logic and mathematical processes used to compute them. Transformer embedding features (TEF) were developed as new continuous vector descriptors/features using the latent space embedding from a multi-head self-attention. The significance of TEF as new descriptors was evaluated by applying them to tasks such as predictive modeling, clustering, and similarity search. An accuracy of 84% on the Ames mutagenicity test indicates that these new features has a correlation to biological activity. Overall, the findings in this study can be applied to improve the performance of machine learning based Quantitative Structure-Activity/Property Relationship (QSAR) efforts for enhanced drug discovery and toxicology assessments

    Flood Forecasting Using Machine Learning Methods

    Get PDF
    This book is a printed edition of the Special Issue Flood Forecasting Using Machine Learning Methods that was published in Wate

    Computational Intelligence for Modeling, Control, Optimization, Forecasting and Diagnostics in Photovoltaic Applications

    Get PDF
    This book is a Special Issue Reprint edited by Prof. Massimo Vitelli and Dr. Luigi Costanzo. It contains original research articles covering, but not limited to, the following topics: maximum power point tracking techniques; forecasting techniques; sizing and optimization of PV components and systems; PV modeling; reconfiguration algorithms; fault diagnosis; mismatching detection; decision processes for grid operators

    Inductive Pattern Formation

    Get PDF
    With the extended computational limits of algorithmic recursion, scientific investigation is transitioning away from computationally decidable problems and beginning to address computationally undecidable complexity. The analysis of deductive inference in structure-property models are yielding to the synthesis of inductive inference in process-structure simulations. Process-structure modeling has examined external order parameters of inductive pattern formation, but investigation of the internal order parameters of self-organization have been hampered by the lack of a mathematical formalism with the ability to quantitatively define a specific configuration of points. This investigation addressed this issue of quantitative synthesis. Local space was developed by the Poincare inflation of a set of points to construct neighborhood intersections, defining topological distance and introducing situated Boolean topology as a local replacement for point-set topology. Parallel development of the local semi-metric topological space, the local semi-metric probability space, and the local metric space of a set of points provides a triangulation of connectivity measures to define the quantitative architectural identity of a configuration and structure independent axes of a structural configuration space. The recursive sequence of intersections constructs a probabilistic discrete spacetime model of interacting fields to define the internal order parameters of self-organization, with order parameters external to the configuration modeled by adjusting the morphological parameters of individual neighborhoods and the interplay of excitatory and inhibitory point sets. The evolutionary trajectory of a configuration maps the development of specific hierarchical structure that is emergent from a specific set of initial conditions, with nested boundaries signaling the nonlinear properties of local causative configurations. This exploration of architectural configuration space concluded with initial process-structure-property models of deductive and inductive inference spaces. In the computationally undecidable problem of human niche construction, an adaptive-inductive pattern formation model with predictive control organized the bipartite recursion between an information structure and its physical expression as hierarchical ensembles of artificial neural network-like structures. The union of architectural identity and bipartite recursion generates a predictive structural model of an evolutionary design process, offering an alternative to the limitations of cognitive descriptive modeling. The low computational complexity of these models enable them to be embedded in physical constructions to create the artificial life forms of a real-time autonomously adaptive human habitat

    Machine Learning with Metaheuristic Algorithms for Sustainable Water Resources Management

    Get PDF
    The main aim of this book is to present various implementations of ML methods and metaheuristic algorithms to improve modelling and prediction hydrological and water resources phenomena having vital importance in water resource management

    Plasma–liquid interactions: a review and roadmap

    Get PDF
    Plasma–liquid interactions represent a growing interdisciplinary area of research involving plasma science, fluid dynamics, heat and mass transfer, photolysis, multiphase chemistry and aerosol science. This review provides an assessment of the state-of-the-art of this multidisciplinary area and identifies the key research challenges. The developments in diagnostics, modeling and further extensions of cross section and reaction rate databases that are necessary to address these challenges are discussed. The review focusses on non-equilibrium plasmas

    Plasma-liquid interactions: a review and roadmap

    Get PDF
    Plasma-liquid interactions represent a growing interdisciplinary area of research involving plasma science, fluid dynamics, heat and mass transfer, photolysis, multiphase chemistry and aerosol science. This review provides an assessment of the state-of-the-art of this multidisciplinary area and identifies the key research challenges. The developments in diagnostics, modeling and further extensions of cross section and reaction rate databases that are necessary to address these challenges are discussed. The review focusses on non-equilibrium plasmas
    • …
    corecore