50 research outputs found
Alchemical and structural distribution based representation for improved QML
We introduce a representation of any atom in any chemical environment for the
generation of efficient quantum machine learning (QML) models of common
electronic ground-state properties. The representation is based on scaled
distribution functions explicitly accounting for elemental and structural
degrees of freedom. Resulting QML models afford very favorable learning curves
for properties of out-of-sample systems including organic molecules,
non-covalently bonded protein side-chains, (HO)-clusters, as well as
diverse crystals. The elemental components help to lower the learning curves,
and, through interpolation across the periodic table, even enable "alchemical
extrapolation" to covalent bonding between elements not part of training, as
evinced for single, double, and triple bonds among main-group elements
Identifying Crystal Structures Beyond Known Prototypes from X-ray Powder Diffraction Spectra
The large amount of powder diffraction data for which the corresponding
crystal structures have not yet been identified suggests the existence of
numerous undiscovered, physically relevant crystal structure prototypes. In
this paper, we present a scheme to resolve powder diffraction data into crystal
structures with precise atomic coordinates by screening the space of all
possible atomic arrangements, i.e., structural prototypes, including those not
previously observed, using a pre-trained machine learning (ML) model. This
involves: (i) enumerating all possible symmetry-confined ways in which a given
composition can be accommodated in a given space group, (ii) ranking the
element-assigned prototype representations using energies predicted using Wren
ML model [Sci.\ Adv.\ 8, eabn4117 (2022)], (iii) assigning and perturbing atoms
along the degree of freedom allowed by the Wyckoff positions to match the
experimental diffraction data (iv) validating the thermodynamic stability of
the material using density-functional theory (DFT). An advantage of the
presented method is that it does not rely on a database of previously observed
prototypes and, therefore is capable of finding crystal structures with
entirely new symmetric arrangements of atoms. We demonstrate the workflow on
unidentified XRD spectra from the ICDD database and identify a number of stable
structures, where a majority turns out to be derivable from known prototypes,
but at least two are found to not be part of our prior structural data sets.Comment: 18 pages including citations and supplementary materials, 4 figures;
overall text improvement; revision of some results in Page
FCHL revisited:Faster and more accurate quantum machine learning
We introduce the FCHL19 representation for atomic environments in molecules
or condensed-phase systems. Machine learning models based on FCHL19 are able to
yield predictions of atomic forces and energies of query compounds with
chemical accuracy on the scale of milliseconds. FCHL19 is a revision of our
previous work [Faber et al. 2018] where the representation is discretized and
the individual features are rigorously optimized using Monte Carlo
optimization. Combined with a Gaussian kernel function that incorporates
elemental screening, chemical accuracy is reached for energy learning on the
QM7b and QM9 datasets after training for minutes and hours, respectively. The
model also shows good performance for non-bonded interactions in the condensed
phase for a set of water clusters with an MAE binding energy error of less than
0.1 kcal/mol/molecule after training on 3,200 samples. For force learning on
the MD17 dataset, our optimized model similarly displays state-of-the-art
accuracy with a regressor based on Gaussian process regression. When the
revised FCHL19 representation is combined with the operator quantum machine
learning regressor, forces and energies can be predicted in only a few
milliseconds per atom. The model presented herein is fast and lightweight
enough for use in general chemistry problems as well as molecular dynamics
simulations
Operators in Machine Learning: Response Properties in Chemical Space
The role of response operators is well established in quantum mechanics. We investigate their use for universal quantum machine learning models of response properties in molecules. After introducing a theoretical basis, we present and discuss numerical evidence based on measuring the potential energy's response with respect to atomic displacement and to electric fields. Prediction errors for corresponding properties, atomic forces and dipole moments, improve in a systematic fashion with training set size and reach high accuracy for small training sets. Prediction of normal modes and IR-spectra of some small molecules demonstrates the usefulness of this approach for chemistry
Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error
We investigate the impact of choosing regressors and molecular representations for the construction of fast machine learning (ML) models of 13 electronic ground-state properties of organic molecules. The performance of each regressor/representation/property combination is assessed using learning curves which report out-of-sample errors as a function of training set size with up to ∼118k distinct molecules. Molecular structures and properties at the hybrid density functional theory (DFT) level of theory come from the QM9 database [Ramakrishnan et al. Sci. Data 2014, 1, 140022] and include enthalpies and free energies of atomization, HOMO/LUMO energies and gap, dipole moment, polarizability, zero point vibrational energy, heat capacity, and the highest fundamental vibrational frequency. Various molecular representations have been studied (Coulomb matrix, bag of bonds, BAML and ECFP4, molecular graphs (MG)), as well as newly developed distribution based variants including histograms of distances (HD), angles (HDA/MARAD), and dihedrals (HDAD). Regressors include linear models (Bayesian ridge regression (BR) and linear regression with elastic net regularization (EN)), random forest (RF), kernel ridge regression (KRR), and two types of neural networks, graph convolutions (GC) and gated graph networks (GG). Out-of sample errors are strongly dependent on the choice of representation and regressor and molecular property. Electronic properties are typically best accounted for by MG and GC, while energetic properties are better described by HDAD and KRR. The specific combinations with the lowest out-of-sample errors in the ∼118k training set size limit are (free) energies and enthalpies of atomization (HDAD/KRR), HOMO/LUMO eigenvalue and gap (MG/GC), dipole moment (MG/GC), static polarizability (MG/GG), zero point vibrational energy (HDAD/KRR), heat capacity at room temperature (HDAD/KRR), and highest fundamental vibrational frequency (BAML/RF). We present numerical evidence that ML model predictions deviate from DFT (B3LYP) less than DFT (B3LYP) deviates from experiment for all properties. Furthermore, out-of-sample prediction errors with respect to hybrid DFT reference are on par with, or close to, chemical accuracy. The results suggest that ML models could be more accurate than hybrid DFT if explicitly electron correlated quantum (or experimental) data were available
Improved reference genome uncovers novel sex-linked regions in the Guppy (Poecilia reticulata)
This is the author accepted manuscript. The final version is available on open access from Oxford University Press via the DOI in this recordData availability:
Population genomics data are available on ENA: Study: PRJEB10680
PCR-free data are available on ENA: Study PRJEB36450
Genome assembly is available on ENA ID: PRJEB36704; ERP119926
All scripts and pipelines are available on github: https://github.com/bfrasercommits/guppy_genomeTheory predicts that the sexes can achieve greater fitness if loci with sexually antagonistic polymorphisms become linked to the sex determining loci, and this can favour the spread of reduced recombination around sex determining regions. Given that sex-linked regions are frequently repetitive and highly heterozygous, few complete Y chromosome assemblies are available to test these ideas. The guppy system (Poecilia reticulata) has long been invoked as an example of sex chromosome formation resulting from sexual conflict. Early genetics studies revealed that male colour patterning genes are mostly but not entirely Y-linked, and that X-linkage may be most common in low predation populations. More recent population genomic studies of guppies have reached varying conclusions about the size and placement of the Y-linked region. However, this previous work used a reference genome assembled from short-read sequences from a female guppy. Here, we present a new guppy reference genome assembly from a male, using long-read PacBio single-molecule real-time sequencing (SMRT) and chromosome contact information. Our new assembly sequences across repeat- and GC-rich regions and thus closes gaps and corrects mis-assemblies found in the short-read female-derived guppy genome. Using this improved reference genome, we then employed broad population sampling to detect sex differences across the genome. We identified two small regions that showed consistent male-specific signals. Moreover, our results help reconcile the contradictory conclusions put forth by past population genomic studies of the guppy sex chromosome. Our results are consistent with a small Y-specific region and rare recombination in male guppies.Max Planck SocietyEuropean Research Council (ERC)Natural Environment Research Council (NERC
SV2 Mediates Entry of Tetanus Neurotoxin into Central Neurons
Tetanus neurotoxin causes the disease tetanus, which is characterized by rigid paralysis. The toxin acts by inhibiting the release of neurotransmitters from inhibitory neurons in the spinal cord that innervate motor neurons and is unique among the clostridial neurotoxins due to its ability to shuttle from the periphery to the central nervous system. Tetanus neurotoxin is thought to interact with a high affinity receptor complex that is composed of lipid and protein components; however, the identity of the protein receptor remains elusive. In the current study, we demonstrate that toxin binding, to dissociated hippocampal and spinal cord neurons, is greatly enhanced by driving synaptic vesicle exocytosis. Moreover, tetanus neurotoxin entry and subsequent cleavage of synaptobrevin II, the substrate for this toxin, was also dependent on synaptic vesicle recycling. Next, we identified the potential synaptic vesicle binding protein for the toxin and found that it corresponded to SV2; tetanus neurotoxin was unable to cleave synaptobrevin II in SV2 knockout neurons. Toxin entry into knockout neurons was rescued by infecting with viruses that express SV2A or SV2B. Tetanus toxin elicited the hyper excitability in dissociated spinal cord neurons - due to preferential loss of inhibitory transmission - that is characteristic of the disease. Surprisingly, in dissociated cortical cultures, low concentrations of the toxin preferentially acted on excitatory neurons. Further examination of the distribution of SV2A and SV2B in both spinal cord and cortical neurons revealed that SV2B is to a large extent localized to excitatory terminals, while SV2A is localized to inhibitory terminals. Therefore, the distinct effects of tetanus toxin on cortical and spinal cord neurons are not due to differential expression of SV2 isoforms. In summary, the findings reported here indicate that SV2A and SV2B mediate binding and entry of tetanus neurotoxin into central neurons
Minimum joint space width and tibial cartilage morphology in the knees of healthy individuals: A cross-sectional study
<p>Abstract</p> <p>Background</p> <p>The clinical use of minimum joint space width (mJSW) and cartilage volume and thickness has been limited to the longitudinal measurement of disease progression (i.e. change over time) rather than the diagnosis of OA in which values are compared to a standard. This is primarily due to lack of establishment of normative values of joint space width and cartilage morphometry as has been done with bone density values in diagnosing osteoporosis. Thus, the purpose of this pilot study is to estimate reference values of medial joint space width and cartilage morphometry in healthy individuals of all ages using standard radiography and peripheral magnetic resonance imaging.</p> <p>Design</p> <p>For this cross-sectional study, healthy volunteers underwent a fixed-flexion knee X-ray and a peripheral MR (pMR) scan of the same knee using a 1T machine (ONI OrthOne™, Wilmington, MA). Radiographs were digitized and analyzed for medial mJSW using an automated algorithm. Only knees scoring ≤1 on the Kellgren-Lawrence scale (no radiographic evidence of knee OA) were included in the analyses. All 3D SPGRE fat-sat sagittal pMR scans were analyzed for medial tibial cartilage morphometry using a proprietary software program (Chondrometrics GmbH).</p> <p>Results</p> <p>Of 119 healthy participants, 73 were female and 47 were male; mean (SD) age 38.2 (13.2) years, mean BMI 25.0 (4.4) kg/m<sup>2</sup>. Minimum JSW values were calculated for each sex and decade of life. Analyses revealed mJSW did not significantly decrease with increasing decade (p > 0.05) in either sex. Females had a mean (SD) medial mJSW of 4.8 (0.7) mm compared to males with corresponding larger value of 5.7 (0.8) mm. Cartilage morphometry results showed similar trends with mean (SD) tibial cartilage volume and thickness in females of 1.50 (0.19) μL/mm<sup>2 </sup>and 1.45 (0.19) mm, respectively, and 1.77 (0.24) μL/mm<sup>2 </sup>and 1.71 (0.24) mm, respectively, in males.</p> <p>Conclusion</p> <p>These data suggest that medial mJSW values do not decrease with aging in healthy individuals but remain fairly constant throughout the lifespan with "healthy" values of 4.8 mm for females and 5.7 mm for males. Similar trends were seen for cartilage morphology. Results suggest there may be no need to differentiate a t-score and a z-score in OA diagnosis because cartilage thickness and JSW remain constant throughout life in the absence of OA.</p