Search CORE

50 research outputs found

Alchemical and structural distribution based representation for improved QML

Author: Christensen Anders S.
Faber Felix A.
Huang Bing
von Lilienfeld O. Anatole
Publication venue: 'AIP Publishing'
Publication date: 22/12/2017
Field of study

We introduce a representation of any atom in any chemical environment for the generation of efficient quantum machine learning (QML) models of common electronic ground-state properties. The representation is based on scaled distribution functions explicitly accounting for elemental and structural degrees of freedom. Resulting QML models afford very favorable learning curves for properties of out-of-sample systems including organic molecules, non-covalently bonded protein side-chains, (H

_2

_{40}

-clusters, as well as diverse crystals. The elemental components help to lower the learning curves, and, through interpolation across the periodic table, even enable "alchemical extrapolation" to covalent bonding between elements not part of training, as evinced for single, double, and triple bonds among main-group elements

arXiv.org e-Print Archive

Crossref

edoc

Identifying Crystal Structures Beyond Known Prototypes from X-ray Powder Diffraction Spectra

Author: Armiento Rickard
Faber Felix A.
Goodall Rhys E. A.
Parackal Abhijith S.
Publication venue
Publication date: 15/11/2023
Field of study

The large amount of powder diffraction data for which the corresponding crystal structures have not yet been identified suggests the existence of numerous undiscovered, physically relevant crystal structure prototypes. In this paper, we present a scheme to resolve powder diffraction data into crystal structures with precise atomic coordinates by screening the space of all possible atomic arrangements, i.e., structural prototypes, including those not previously observed, using a pre-trained machine learning (ML) model. This involves: (i) enumerating all possible symmetry-confined ways in which a given composition can be accommodated in a given space group, (ii) ranking the element-assigned prototype representations using energies predicted using Wren ML model [Sci.\ Adv.\ 8, eabn4117 (2022)], (iii) assigning and perturbing atoms along the degree of freedom allowed by the Wyckoff positions to match the experimental diffraction data (iv) validating the thermodynamic stability of the material using density-functional theory (DFT). An advantage of the presented method is that it does not rely on a database of previously observed prototypes and, therefore is capable of finding crystal structures with entirely new symmetric arrangements of atoms. We demonstrate the workflow on unidentified XRD spectra from the ICDD database and identify a number of stable structures, where a majority turns out to be derivable from known prototypes, but at least two are found to not be part of our prior structural data sets.Comment: 18 pages including citations and supplementary materials, 4 figures; overall text improvement; revision of some results in Page

arXiv.org e-Print Archive

FCHL revisited:Faster and more accurate quantum machine learning

Author: Anatole Von Lilienfeld O.
Bratholm Lars A.
Christensen Anders S.
Faber Felix A.
Publication venue: 'AIP Publishing'
Publication date: 21/01/2020
Field of study

We introduce the FCHL19 representation for atomic environments in molecules or condensed-phase systems. Machine learning models based on FCHL19 are able to yield predictions of atomic forces and energies of query compounds with chemical accuracy on the scale of milliseconds. FCHL19 is a revision of our previous work [Faber et al. 2018] where the representation is discretized and the individual features are rigorously optimized using Monte Carlo optimization. Combined with a Gaussian kernel function that incorporates elemental screening, chemical accuracy is reached for energy learning on the QM7b and QM9 datasets after training for minutes and hours, respectively. The model also shows good performance for non-bonded interactions in the condensed phase for a set of water clusters with an MAE binding energy error of less than 0.1 kcal/mol/molecule after training on 3,200 samples. For force learning on the MD17 dataset, our optimized model similarly displays state-of-the-art accuracy with a regressor based on Gaussian process regression. When the revised FCHL19 representation is combined with the operator quantum machine learning regressor, forces and energies can be predicted in only a few milliseconds per atom. The model presented herein is fast and lightweight enough for use in general chemistry problems as well as molecular dynamics simulations

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

Operators in Machine Learning: Response Properties in Chemical Space

Author: Christensen Anders S.
Faber Felix A.
von Lilienfeld Anatole O.
Publication venue: 'AIP Publishing'
Publication date: 01/01/2018
Field of study

The role of response operators is well established in quantum mechanics. We investigate their use for universal quantum machine learning models of response properties in molecules. After introducing a theoretical basis, we present and discuss numerical evidence based on measuring the potential energy's response with respect to atomic displacement and to electric fields. Prediction errors for corresponding properties, atomic forces and dipole moments, improve in a systematic fashion with training set size and reach high accuracy for small training sets. Prediction of normal modes and IR-spectra of some small molecules demonstrates the usefulness of this approach for chemistry

edoc

Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error

Author: Dahl George E.
Faber Felix A.
Gilmer Justin
Huang Bing
Hutchison Luke
Kearnes Steven
Riley Patrick F.
Schoenholz Samuel S.
Vinyals Oriol
von Lilienfeld O. Anatole
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2017
Field of study

We investigate the impact of choosing regressors and molecular representations for the construction of fast machine learning (ML) models of 13 electronic ground-state properties of organic molecules. The performance of each regressor/representation/property combination is assessed using learning curves which report out-of-sample errors as a function of training set size with up to ∼118k distinct molecules. Molecular structures and properties at the hybrid density functional theory (DFT) level of theory come from the QM9 database [Ramakrishnan et al. Sci. Data 2014, 1, 140022] and include enthalpies and free energies of atomization, HOMO/LUMO energies and gap, dipole moment, polarizability, zero point vibrational energy, heat capacity, and the highest fundamental vibrational frequency. Various molecular representations have been studied (Coulomb matrix, bag of bonds, BAML and ECFP4, molecular graphs (MG)), as well as newly developed distribution based variants including histograms of distances (HD), angles (HDA/MARAD), and dihedrals (HDAD). Regressors include linear models (Bayesian ridge regression (BR) and linear regression with elastic net regularization (EN)), random forest (RF), kernel ridge regression (KRR), and two types of neural networks, graph convolutions (GC) and gated graph networks (GG). Out-of sample errors are strongly dependent on the choice of representation and regressor and molecular property. Electronic properties are typically best accounted for by MG and GC, while energetic properties are better described by HDAD and KRR. The specific combinations with the lowest out-of-sample errors in the ∼118k training set size limit are (free) energies and enthalpies of atomization (HDAD/KRR), HOMO/LUMO eigenvalue and gap (MG/GC), dipole moment (MG/GC), static polarizability (MG/GG), zero point vibrational energy (HDAD/KRR), heat capacity at room temperature (HDAD/KRR), and highest fundamental vibrational frequency (BAML/RF). We present numerical evidence that ML model predictions deviate from DFT (B3LYP) less than DFT (B3LYP) deviates from experiment for all properties. Furthermore, out-of-sample prediction errors with respect to hybrid DFT reference are on par with, or close to, chemical accuracy. The results suggest that ML models could be more accurate than hybrid DFT if explicitly electron correlated quantum (or experimental) data were available

arXiv.org e-Print Archive

Crossref

edoc

FigShare

Improved reference genome uncovers novel sex-linked regions in the Guppy (Poecilia reticulata)

Author: Alkan
Altschul
Andrews
Ankenbrand
Audoux
Bachtrog
Bachtrog
Bankevich
Bergero
Bissegger
Bonnie A Fraser
Buchfink
Camacho
Cameron J Weadick
Chalopin
Chang Liu
Charlesworth
Charlesworth
Charlesworth
Charlesworth
Christine Dreyer
Cotter
Danecek
Deborah Charlesworth
Dechaud
Detlef Weigel
Dor
Endler
Faber-Hammond
Felix Bemm
Fraser
Fry
Fu
García-Alcalde
Garrison
Girgis
Gordon
Gurevich
Haskins
Hatakeyama
Herpin
Hoff
Huerta-Cepas
Hughes
James R Whiting
Josephine R Paris
Kasimatis
Khelik
Kim
Kobayashi
Kondo
Krueger
Krzywinski
Kurtz
Künstner
Lajoie
Li
Li
Li
Li
Lieberman-Aiden
Lindholm
Lisachov
Liu
Lomsadze
Lubieniecki
Ma
Magurran
Margarete Hoffmann
Maria Costantini
Martin
Marçais
McKenna
Mitchell
Morris
Nanda
Nanda
Nei
Okonechnikov
Olendorf
O’Neil
Paul J Parsons
Pfeifer
Ponnikas
Purcell
Ramírez
Rice
Roberta Bergero
Schartl
Schulz
Seppey
Sharma
Smit
Song
Stanke
Stanke
Tomaszkiewicz
Tripathi
Tripathi
Verena A Kottler
Volff
Willing
Winge
Wright
Zhou
Publication venue: 'Oxford University Press (OUP)'
Publication date: 03/09/2020
Field of study

This is the author accepted manuscript. The final version is available on open access from Oxford University Press via the DOI in this recordData availability: Population genomics data are available on ENA: Study: PRJEB10680 PCR-free data are available on ENA: Study PRJEB36450 Genome assembly is available on ENA ID: PRJEB36704; ERP119926 All scripts and pipelines are available on github: https://github.com/bfrasercommits/guppy_genomeTheory predicts that the sexes can achieve greater fitness if loci with sexually antagonistic polymorphisms become linked to the sex determining loci, and this can favour the spread of reduced recombination around sex determining regions. Given that sex-linked regions are frequently repetitive and highly heterozygous, few complete Y chromosome assemblies are available to test these ideas. The guppy system (Poecilia reticulata) has long been invoked as an example of sex chromosome formation resulting from sexual conflict. Early genetics studies revealed that male colour patterning genes are mostly but not entirely Y-linked, and that X-linkage may be most common in low predation populations. More recent population genomic studies of guppies have reached varying conclusions about the size and placement of the Y-linked region. However, this previous work used a reference genome assembled from short-read sequences from a female guppy. Here, we present a new guppy reference genome assembly from a male, using long-read PacBio single-molecule real-time sequencing (SMRT) and chromosome contact information. Our new assembly sequences across repeat- and GC-rich regions and thus closes gaps and corrects mis-assemblies found in the short-read female-derived guppy genome. Using this improved reference genome, we then employed broad population sampling to detect sex differences across the genome. We identified two small regions that showed consistent male-specific signals. Moreover, our results help reconcile the contradictory conclusions put forth by past population genomic studies of the guppy sex chromosome. Our results are consistent with a small Y-specific region and rare recombination in male guppies.Max Planck SocietyEuropean Research Council (ERC)Natural Environment Research Council (NERC

Crossref

Edinburgh Research Explorer

Open Research Exeter

MPG.PuRe

SV2 Mediates Entry of Tetanus Neurotoxin into Central Neurons

Tetanus neurotoxin causes the disease tetanus, which is characterized by rigid paralysis. The toxin acts by inhibiting the release of neurotransmitters from inhibitory neurons in the spinal cord that innervate motor neurons and is unique among the clostridial neurotoxins due to its ability to shuttle from the periphery to the central nervous system. Tetanus neurotoxin is thought to interact with a high affinity receptor complex that is composed of lipid and protein components; however, the identity of the protein receptor remains elusive. In the current study, we demonstrate that toxin binding, to dissociated hippocampal and spinal cord neurons, is greatly enhanced by driving synaptic vesicle exocytosis. Moreover, tetanus neurotoxin entry and subsequent cleavage of synaptobrevin II, the substrate for this toxin, was also dependent on synaptic vesicle recycling. Next, we identified the potential synaptic vesicle binding protein for the toxin and found that it corresponded to SV2; tetanus neurotoxin was unable to cleave synaptobrevin II in SV2 knockout neurons. Toxin entry into knockout neurons was rescued by infecting with viruses that express SV2A or SV2B. Tetanus toxin elicited the hyper excitability in dissociated spinal cord neurons - due to preferential loss of inhibitory transmission - that is characteristic of the disease. Surprisingly, in dissociated cortical cultures, low concentrations of the toxin preferentially acted on excitatory neurons. Further examination of the distribution of SV2A and SV2B in both spinal cord and cortical neurons revealed that SV2B is to a large extent localized to excitatory terminals, while SV2A is localized to inhibitory terminals. Therefore, the distinct effects of tetanus toxin on cortical and spinal cord neurons are not due to differential expression of SV2 isoforms. In summary, the findings reported here indicate that SV2A and SV2B mediate binding and entry of tetanus neurotoxin into central neurons

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Minimum joint space width and tibial cartilage morphology in the knees of healthy individuals: A cross-sectional study

Abstract Background The clinical use of minimum joint space width (mJSW) and cartilage volume and thickness has been limited to the longitudinal measurement of disease progression (i.e. change over time) rather than the diagnosis of OA in which values are compared to a standard. This is primarily due to lack of establishment of normative values of joint space width and cartilage morphometry as has been done with bone density values in diagnosing osteoporosis. Thus, the purpose of this pilot study is to estimate reference values of medial joint space width and cartilage morphometry in healthy individuals of all ages using standard radiography and peripheral magnetic resonance imaging. Design For this cross-sectional study, healthy volunteers underwent a fixed-flexion knee X-ray and a peripheral MR (pMR) scan of the same knee using a 1T machine (ONI OrthOne™, Wilmington, MA). Radiographs were digitized and analyzed for medial mJSW using an automated algorithm. Only knees scoring ≤1 on the Kellgren-Lawrence scale (no radiographic evidence of knee OA) were included in the analyses. All 3D SPGRE fat-sat sagittal pMR scans were analyzed for medial tibial cartilage morphometry using a proprietary software program (Chondrometrics GmbH). Results Of 119 healthy participants, 73 were female and 47 were male; mean (SD) age 38.2 (13.2) years, mean BMI 25.0 (4.4) kg/m2. Minimum JSW values were calculated for each sex and decade of life. Analyses revealed mJSW did not significantly decrease with increasing decade (p > 0.05) in either sex. Females had a mean (SD) medial mJSW of 4.8 (0.7) mm compared to males with corresponding larger value of 5.7 (0.8) mm. Cartilage morphometry results showed similar trends with mean (SD) tibial cartilage volume and thickness in females of 1.50 (0.19) μL/mm2 and 1.45 (0.19) mm, respectively, and 1.77 (0.24) μL/mm2 and 1.71 (0.24) mm, respectively, in males. Conclusion These data suggest that medial mJSW values do not decrease with aging in healthy individuals but remain fairly constant throughout the lifespan with "healthy" values of 4.8 mm for females and 5.7 mm for males. Similar trends were seen for cartilage morphology. Results suggest there may be no need to differentiate a t-score and a z-score in OA diagnosis because cartilage thickness and JSW remain constant throughout life in the absence of OA.</p

Crossref

Directory of Open Access Journals

PubMed Central