52,717 research outputs found
First-principles molecular structure search with a genetic algorithm
The identification of low-energy conformers for a given molecule is a
fundamental problem in computational chemistry and cheminformatics. We assess
here a conformer search that employs a genetic algorithm for sampling the
low-energy segment of the conformation space of molecules. The algorithm is
designed to work with first-principles methods, facilitated by the
incorporation of local optimization and blacklisting conformers to prevent
repeated evaluations of very similar solutions. The aim of the search is not
only to find the global minimum, but to predict all conformers within an energy
window above the global minimum. The performance of the search strategy is: (i)
evaluated for a reference data set extracted from a database with amino acid
dipeptide conformers obtained by an extensive combined force field and
first-principles search and (ii) compared to the performance of a systematic
search and a random conformer generator for the example of a drug-like ligand
with 43 atoms, 8 rotatable bonds and 1 cis/trans bond
Adaptive Genetic Algorithm for Crystal Structure Prediction
We present a genetic algorithm (GA) for structural search that combines the
speed of structure exploration by classical potentials with the accuracy of
density functional theory (DFT) calculations in an adaptive and iterative way.
This strategy increases the efficiency of the DFT-based GA by several orders of
magnitude. This gain allows considerable increase in size and complexity of
systems that can be studied by first principles. The method's performance is
illustrated by successful structure identifications of complex binary and
ternary inter-metallic compounds with 36 and 54 atoms per cell, respectively.
The discovery of a multi-TPa Mg-silicate phase with unit cell containing up to
56 atoms is also reported. Such phase is likely to be an essential component of
terrestrial exoplanetary mantles.Comment: 14 pages, 4 figure
Fast, accurate, and transferable many-body interatomic potentials by symbolic regression
The length and time scales of atomistic simulations are limited by the
computational cost of the methods used to predict material properties. In
recent years there has been great progress in the use of machine learning
algorithms to develop fast and accurate interatomic potential models, but it
remains a challenge to develop models that generalize well and are fast enough
to be used at extreme time and length scales. To address this challenge, we
have developed a machine learning algorithm based on symbolic regression in the
form of genetic programming that is capable of discovering accurate,
computationally efficient manybody potential models. The key to our approach is
to explore a hypothesis space of models based on fundamental physical
principles and select models within this hypothesis space based on their
accuracy, speed, and simplicity. The focus on simplicity reduces the risk of
overfitting the training data and increases the chances of discovering a model
that generalizes well. Our algorithm was validated by rediscovering an exact
Lennard-Jones potential and a Sutton Chen embedded atom method potential from
training data generated using these models. By using training data generated
from density functional theory calculations, we found potential models for
elemental copper that are simple, as fast as embedded atom models, and capable
of accurately predicting properties outside of their training set. Our approach
requires relatively small sets of training data, making it possible to generate
training data using highly accurate methods at a reasonable computational cost.
We present our approach, the forms of the discovered models, and assessments of
their transferability, accuracy and speed
Combining Bayesian Approaches and Evolutionary Techniques for the Inference of Breast Cancer Networks
Gene and protein networks are very important to model complex large-scale
systems in molecular biology. Inferring or reverseengineering such networks can
be defined as the process of identifying gene/protein interactions from
experimental data through computational analysis. However, this task is
typically complicated by the enormously large scale of the unknowns in a rather
small sample size. Furthermore, when the goal is to study causal relationships
within the network, tools capable of overcoming the limitations of correlation
networks are required. In this work, we make use of Bayesian Graphical Models
to attach this problem and, specifically, we perform a comparative study of
different state-of-the-art heuristics, analyzing their performance in inferring
the structure of the Bayesian Network from breast cancer data
Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis
This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work
Towards Understanding the Origin of Genetic Languages
Molecular biology is a nanotechnology that works--it has worked for billions
of years and in an amazing variety of circumstances. At its core is a system
for acquiring, processing and communicating information that is universal, from
viruses and bacteria to human beings. Advances in genetics and experience in
designing computers have taken us to a stage where we can understand the
optimisation principles at the root of this system, from the availability of
basic building blocks to the execution of tasks. The languages of DNA and
proteins are argued to be the optimal solutions to the information processing
tasks they carry out. The analysis also suggests simpler predecessors to these
languages, and provides fascinating clues about their origin. Obviously, a
comprehensive unraveling of the puzzle of life would have a lot to say about
what we may design or convert ourselves into.Comment: (v1) 33 pages, contributed chapter to "Quantum Aspects of Life",
edited by D. Abbott, P. Davies and A. Pati, (v2) published version with some
editin
- …