2,418 research outputs found
Recommended from our members
Complex macrocycle exploration: parallel, heuristic, and constraint-based conformer generation using ForceGen.
ForceGen is a template-free, non-stochastic approach for 2D to 3D structure generation and conformational elaboration for small molecules, including both non-macrocycles and macrocycles. For conformational search of non-macrocycles, ForceGen is both faster and more accurate than the best of all tested methods on a very large, independently curated benchmark of 2859 PDB ligands. In this study, the primary results are on macrocycles, including results for 431 unique examples from four separate benchmarks. These include complex peptide and peptide-like cases that can form networks of internal hydrogen bonds. By making use of new physical movements ("flips" of near-linear sub-cycles and explicit formation of hydrogen bonds), ForceGen exhibited statistically significantly better performance for overall RMS deviation from experimental coordinates than all other approaches. The algorithmic approach offers natural parallelization across multiple computing-cores. On a modest multi-core workstation, for all but the most complex macrocycles, median wall-clock times were generally under a minute in fast search mode and under 2 min using thorough search. On the most complex cases (roughly cyclic decapeptides and larger) explicit exploration of likely hydrogen bonding networks yielded marked improvements, but with calculation times increasing to several minutes and in some cases to roughly an hour for fast search. In complex cases, utilization of NMR data to constrain conformational search produces accurate conformational ensembles representative of solution state macrocycle behavior. On macrocycles of typical complexity (up to 21 rotatable macrocyclic and exocyclic bonds), design-focused macrocycle optimization can be practically supported by computational chemistry at interactive time-scales, with conformational ensemble accuracy equaling what is seen with non-macrocyclic ligands. For more complex macrocycles, inclusion of sparse biophysical data is a helpful adjunct to computation
Kinetic model construction using chemoinformatics
Kinetic models of chemical processes not only provide an alternative to costly experiments; they also have the potential to accelerate the pace of innovation in developing new chemical processes or in improving existing ones. Kinetic models are most powerful when they reflect the underlying chemistry by incorporating elementary pathways between individual molecules. The downside of this high level of detail is that the complexity and size of the models also steadily increase, such that the models eventually become too difficult to be manually constructed. Instead, computers are programmed to automate the construction of these models, and make use of graph theory to translate chemical entities such as molecules and reactions into computer-understandable representations.
This work studies the use of automated methods to construct kinetic models. More particularly, the need to account for the three-dimensional arrangement of atoms in molecules and reactions of kinetic models is investigated and illustrated by two case studies. First of all, the thermal rearrangement of two monoterpenoids, cis- and trans-2-pinanol, is studied. A kinetic model that accounts for the differences in reactivity and selectivity of both pinanol diastereomers is proposed. Secondly, a kinetic model for the pyrolysis of the fuel “JP-10” is constructed and highlights the use of state-of-the-art techniques for the automated estimation of thermochemistry of polycyclic molecules.
A new code is developed for the automated construction of kinetic models and takes advantage of the advances made in the field of chemo-informatics to tackle fundamental issues of previous approaches. Novel algorithms are developed for three important aspects of automated construction of kinetic models: the estimation of symmetry of molecules and reactions, the incorporation of stereochemistry in kinetic models, and the estimation of thermochemical and kinetic data using scalable structure-property methods. Finally, the application of the code is illustrated by the automated construction of a kinetic model for alkylsulfide pyrolysis
Preferential attachment during the evolution of a potential energy landscape
It has previously been shown that the network of connected minima on a
potential energy landscape is scale-free, and that this reflects a power-law
distribution for the areas of the basins of attraction surrounding the minima.
Here, we set out to understand more about the physical origins of these
puzzling properties by examining how the potential energy landscape of a
13-atom cluster evolves with the range of the potential. In particular, on
decreasing the range of the potential the number of stationary points increases
and thus the landscape becomes rougher and the network gets larger. Thus, we
are able to follow the evolution of the potential energy landscape from one
with just a single minimum to a complex landscape with many minima and a
scale-free pattern of connections. We find that during this growth process, new
edges in the network of connected minima preferentially attach to more
highly-connected minima, thus leading to the scale-free character. Furthermore,
minima that appear when the range of the potential is shorter and the network
is larger have smaller basins of attraction. As there are many of these smaller
basins because the network grows exponentially, the observed growth process
thus also gives rise to a power-law distribution for the hyperareas of the
basins.Comment: 10 pages, 10 figure
Two essays in computational optimization: computing the clar number in fullerene graphs and distributing the errors in iterative interior point methods
Fullerene are cage-like hollow carbon molecules graph of pseudospherical sym-
metry consisting of only pentagons and hexagons faces. It has been the object
of interest for chemists and mathematicians due to its widespread application
in various fields, namely including electronic and optic engineering, medical sci-
ence and biotechnology. A Fullerene molecular, Γ n of n atoms has a multiplicity
of isomers which increases as N iso ∼ O(n 9 ). For instance, Γ 180 has 79,538,751
isomers. The Fries and Clar numbers are stability predictors of a Fullerene
molecule. These number can be computed by solving a (possibly N P -hard)
combinatorial optimization problem. We propose several ILP formulation of
such a problem each yielding a solution algorithm that provides the exact value
of the Fries and Clar numbers. We compare the performances of the algorithm
derived from the proposed ILP formulations. One of this algorithm is used to
find the Clar isomers, i.e., those for which the Clar number is maximum among
all isomers having a given size. We repeated this computational experiment for
all sizes up to 204 atoms. In the course of the study a total of 2 649 413 774
isomers were analyzed.The second essay concerns developing an iterative primal dual infeasible path
following (PDIPF) interior point (IP) algorithm for separable convex quadratic
minimum cost flow network problem. In each iteration of PDIPF algorithm, the
main computational effort is solving the underlying Newton search direction
system. We concentrated on finding the solution of the corresponding linear
system iteratively and inexactly. We assumed that all the involved inequalities
can be solved inexactly and to this purpose, we focused on different approaches
for distributing the error generated by iterative linear solvers such that the
convergences of the PDIPF algorithm are guaranteed. As a result, we achieved
theoretical bases that open the path to further interesting practical investiga-
tion
Impact of noise on inverse design: The case of NMR spectra matching
Despite its fundamental importance and widespread use for assessing reaction
success in organic chemistry, deducing chemical structures from nuclear
magnetic resonance (NMR) measurements has remained largely manual and time
consuming. To keep up with the accelerated pace of automated synthesis in self
driving laboratory settings, robust computational algorithms are needed to
rapidly perform structure elucidations. We analyse the effectiveness of solving
the NMR spectra matching task encountered in this inverse structure elucidation
problem by systematically constraining the chemical search space, and
correspondingly reducing the ambiguity of the matching task. Numerical evidence
collected for the twenty most common stoichiometries in the QM9-NMR data base
indicate systematic trends of more permissible machine learning prediction
errors in constrained search spaces. Results suggest that compounds with
multiple heteroatoms are harder to characterize than others. Extending QM9 by
10 times more constitutional isomers with 3D structures generated by
Surge, ETKDG and CREST, we used ML models of chemical shifts trained on the
QM9-NMR data to test the spectra matching algorithms. Combining both
and shifts in the matching process suggests
twice as permissible machine learning prediction errors than for matching based
on shifts alone. Performance curves demonstrate that reducing
ambiguity and search space can decrease machine learning training data needs by
orders of magnitude
Software platform virtualization in chemistry research and university teaching
<p>Abstract</p> <p>Background</p> <p>Modern chemistry laboratories operate with a wide range of software applications under different operating systems, such as Windows, LINUX or Mac OS X. Instead of installing software on different computers it is possible to install those applications on a single computer using Virtual Machine software. Software platform virtualization allows a single guest operating system to execute multiple other operating systems on the same computer. We apply and discuss the use of virtual machines in chemistry research and teaching laboratories.</p> <p>Results</p> <p>Virtual machines are commonly used for cheminformatics software development and testing. Benchmarking multiple chemistry software packages we have confirmed that the computational speed penalty for using virtual machines is low and around 5% to 10%. Software virtualization in a teaching environment allows faster deployment and easy use of commercial and open source software in hands-on computer teaching labs.</p> <p>Conclusion</p> <p>Software virtualization in chemistry, mass spectrometry and cheminformatics is needed for software testing and development of software for different operating systems. In order to obtain maximum performance the virtualization software should be multi-core enabled and allow the use of multiprocessor configurations in the virtual machine environment. Server consolidation, by running multiple tasks and operating systems on a single physical machine, can lead to lower maintenance and hardware costs especially in small research labs. The use of virtual machines can prevent software virus infections and security breaches when used as a sandbox system for internet access and software testing. Complex software setups can be created with virtual machines and are easily deployed later to multiple computers for hands-on teaching classes. We discuss the popularity of bioinformatics compared to cheminformatics as well as the missing cheminformatics education at universities worldwide.</p
- …