16 research outputs found

    Visual and computational analysis of structure-activity relationships in high-throughput screening data

    Get PDF
    Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets

    First-principles molecular structure search with a genetic algorithm

    Full text link
    The identification of low-energy conformers for a given molecule is a fundamental problem in computational chemistry and cheminformatics. We assess here a conformer search that employs a genetic algorithm for sampling the low-energy segment of the conformation space of molecules. The algorithm is designed to work with first-principles methods, facilitated by the incorporation of local optimization and blacklisting conformers to prevent repeated evaluations of very similar solutions. The aim of the search is not only to find the global minimum, but to predict all conformers within an energy window above the global minimum. The performance of the search strategy is: (i) evaluated for a reference data set extracted from a database with amino acid dipeptide conformers obtained by an extensive combined force field and first-principles search and (ii) compared to the performance of a systematic search and a random conformer generator for the example of a drug-like ligand with 43 atoms, 8 rotatable bonds and 1 cis/trans bond

    Calculation of substructural analysis weights using a genetic algorithm

    Get PDF
    This paper describes a genetic algorithm for the calculation of substructural analysis for use in ligand-based virtual screening. The algorithm is simple in concept and effective in operation, with simulated virtual screening experiments using the MDDR and WOMBAT datasets showing it to be superior to substructural analysis weights based on a naive Bayesian classifier

    Soft Computing, Artificial Intelligence, Fuzzy Logic & Genetic Algorithm in Bioinformatics

    Get PDF
    Abstract Soft computing is creating several possibilities in bioinformatics, especially by generating low-cost, low precision (approximate), good solutions. Bioinformatics is an interdisciplinary research area that is the interface between the biological and computational sciences. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, structural biology, software engineering, data mining, image processing, modeling and simulation, discrete mathematics, control and system theory, circuit theory, and statistics. Despite of a high number of techniques specifically dedicated to bioinformatics problems as well as many successful applications, we are in the beginning of a process to massively integrate the aspects and experiences in the different core subjects such as biology, medicine, computer science, engineering, and mathematics. Recently the use of soft computing tools for solving bioinformatics problems have been gaining the attention of researchers because of their ability to handle imprecision, uncertainty in large and complex search spaces. The paper will focus on soft computing paradigm in bioinformatics with particular emphasis on integrative research

    Ligand-based virtual screening using a genetic algorithm with data fusion

    Get PDF
    Substructural analysis provides a simple and effective way of ranking the 2D fingerprints representing the molecules in a database upon the basis of weights that denote a substructural fragment’s contribution to the overall activity or inactivity of a molecule. A substructural analysis method has been described recently that is based on the use of a genetic algorithm (GA), with the resulting sets of weights proving to be more effective for ligand-based virtual screening than existing approaches. However, the inherently non-deterministic nature of a GA means that different runs are likely to result in different sets of weights and hence in variations in the effectiveness of screening. This paper describes the use of data fusion to combine the rankings generated in multiple GA runs, and demonstrates that the resulting fused rankings are markedly superior to GA runs on average, and in some cases can even exceed the performance of the very best individual GA run

    Advances in De Novo Drug Design : From Conventional to Machine Learning Methods

    Get PDF
    De novo drug design is a computational approach that generates novel molecular structures from atomic building blocks with no a priori relationships. Conventional methods include structure-based and ligand-based design, which depend on the properties of the active site of a biological target or its known active binders, respectively. Artificial intelligence, including ma-chine learning, is an emerging field that has positively impacted the drug discovery process. Deep reinforcement learning is a subdivision of machine learning that combines artificial neural networks with reinforcement-learning architectures. This method has successfully been em-ployed to develop novel de novo drug design approaches using a variety of artificial networks including recurrent neural networks, convolutional neural networks, generative adversarial networks, and autoencoders. This review article summarizes advances in de novo drug design, from conventional growth algorithms to advanced machine-learning methodologies and high-lights hot topics for further development.Peer reviewe

    Statistical Methods for Bioinformatics: Estimation of Copy N umber and Detection of Gene Interactions

    Get PDF
    Identification of copy number aberrations in the human genome has been an important area in cancer research. In the first part of my thesis, I propose a new model for determining genomic copy numbers using high-density single nucleotide polymorphism genotyping microarrays. The method is based on a Bayesian spatial normal mixture model with an unknown number of components corresponding to true copy numbers. A reversible jump Markov chain Monte Carlo algorithm is used to implement the model and perform posterior inference. The second part of the thesis describes a new method for the detection of gene-gene interactions using gene expression data extracted from micro array experiments. The method is based on a two-step Genetic Algorithm, with the first step detecting main effects and the second step looking for interacting gene pairs. The performances of both algorithms are examined on both simulated data and real cancer data and are compared with popular existing algorithms. Conclusions are given and possible extensions are discussed
    corecore