21,324 research outputs found
Resolving transition metal chemical space: feature selection for machine learning and structure-property relationships
Machine learning (ML) of quantum mechanical properties shows promise for
accelerating chemical discovery. For transition metal chemistry where accurate
calculations are computationally costly and available training data sets are
small, the molecular representation becomes a critical ingredient in ML model
predictive accuracy. We introduce a series of revised autocorrelation functions
(RACs) that encode relationships between the heuristic atomic properties (e.g.,
size, connectivity, and electronegativity) on a molecular graph. We alter the
starting point, scope, and nature of the quantities evaluated in standard ACs
to make these RACs amenable to inorganic chemistry. On an organic molecule set,
we first demonstrate superior standard AC performance to other
presently-available topological descriptors for ML model training, with mean
unsigned errors (MUEs) for atomization energies on set-aside test molecules as
low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs
on set-aside test molecules in spin-state splitting in comparison to 15-20x
higher errors from feature sets that encode whole-molecule structural
information. Systematic feature selection methods including univariate
filtering, recursive feature elimination, and direct optimization (e.g., random
forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5x
smaller than RAC-155 produce sub- to 1-kcal/mol spin-splitting MUEs, with good
transferability to metal-ligand bond length prediction (0.004-5 {\AA} MUE) and
redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature
selection results across property sets reveals the relative importance of
local, electronic descriptors (e.g., electronegativity, atomic number) in
spin-splitting and distal, steric effects in redox potential and bond lengths.Comment: 43 double spaced pages, 11 figures, 4 table
The pharmacophore kernel for virtual screening with support vector machines
We introduce a family of positive definite kernels specifically optimized for
the manipulation of 3D structures of molecules with kernel methods. The kernels
are based on the comparison of the three-points pharmacophores present in the
3D structures of molecul es, a set of molecular features known to be
particularly relevant for virtual screening applications. We present a
computationally demanding exact implementation of these kernels, as well as
fast approximations related to the classical fingerprint-based approa ches.
Experimental results suggest that this new approach outperforms
state-of-the-art algorithms based on the 2D structure of mol ecules for the
detection of inhibitors of several drug targets
Computation of protein geometry and its applications: Packing and function prediction
This chapter discusses geometric models of biomolecules and geometric
constructs, including the union of ball model, the weigthed Voronoi diagram,
the weighted Delaunay triangulation, and the alpha shapes. These geometric
constructs enable fast and analytical computaton of shapes of biomoleculres
(including features such as voids and pockets) and metric properties (such as
area and volume). The algorithms of Delaunay triangulation, computation of
voids and pockets, as well volume/area computation are also described. In
addition, applications in packing analysis of protein structures and protein
function prediction are also discussed.Comment: 32 pages, 9 figure
Silicon Atomic Quantum Dots Enable Beyond-CMOS Electronics
We review our recent efforts in building atom-scale quantum-dot cellular
automata circuits on a silicon surface. Our building block consists of silicon
dangling bond on a H-Si(001) surface, which has been shown to act as a quantum
dot. First the fabrication, experimental imaging, and charging character of the
dangling bond are discussed. We then show how precise assemblies of such dots
can be created to form artificial molecules. Such complex structures can be
used as systems with custom optical properties, circuit elements for
quantum-dot cellular automata, and quantum computing. Considerations on
macro-to-atom connections are discussed.Comment: 28 pages, 19 figure
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Molecular Dynamics Simulation of Macromolecules Using Graphics Processing Unit
Molecular dynamics (MD) simulation is a powerful computational tool to study
the behavior of macromolecular systems. But many simulations of this field are
limited in spatial or temporal scale by the available computational resource.
In recent years, graphics processing unit (GPU) provides unprecedented
computational power for scientific applications. Many MD algorithms suit with
the multithread nature of GPU. In this paper, MD algorithms for macromolecular
systems that run entirely on GPU are presented. Compared to the MD simulation
with free software GROMACS on a single CPU core, our codes achieve about 10
times speed-up on a single GPU. For validation, we have performed MD
simulations of polymer crystallization on GPU, and the results observed
perfectly agree with computations on CPU. Therefore, our single GPU codes have
already provided an inexpensive alternative for macromolecular simulations on
traditional CPU clusters and they can also be used as a basis to develop
parallel GPU programs to further speedup the computations.Comment: 21 pages, 16 figure
The surface accessibility of α-bungarotoxin monitored by a novel paramagnetic probe
The surface accessibility of {alpha}-bungarotoxin has been investigated by using Gd2L7, a newly designed paramagnetic NMR probe. Signal attenuations induced by Gd2L7 on {alpha}-bungarotoxin C{alpha}H peaks of 1H-13C HSQC spectra have been analyzed and compared with the ones previously obtained in the presence of GdDTPA-BMA. In spite of the different molecular size and shape, for the two probes a common pathway of approach to the {alpha}-bungarotoxin surface can be observed with an equally enhanced access of both GdDTPA-BMA and Gd2L7 towards the protein surface side where the binding site is located. Molecular dynamics simulations suggest that protein backbone flexibility and surface hydration contribute to the observed preferential approach of both gadolinium complexes specifically to the part of the {alpha}-bungarotoxin surface which is involved in the interaction with its physiological target, the nicotinic acetylcholine receptor
- …