1,173 research outputs found

    Crystal Structure Search with Random Relaxations Using Graph Networks

    Full text link
    Materials design enables technologies critical to humanity, including combating climate change with solar cells and batteries. Many properties of a material are determined by its atomic crystal structure. However, prediction of the atomic crystal structure for a given material's chemical formula is a long-standing grand challenge that remains a barrier in materials design. We investigate a data-driven approach to accelerating ab initio random structure search (AIRSS), a state-of-the-art method for crystal structure search. We build a novel dataset of random structure relaxations of Li-Si battery anode materials using high-throughput density functional theory calculations. We train graph neural networks to simulate relaxations of random structures. Our model is able to find an experimentally verified structure of Li15Si4 it was not trained on, and has potential for orders of magnitude speedup over AIRSS when searching large unit cells and searching over multiple chemical stoichiometries. Surprisingly, we find that data augmentation of adding Gaussian noise improves both the accuracy and out of domain generalization of our models.Comment: Removed citations from the abstract, paper content is unchange

    Computational Ligand Descriptors for Catalyst Design

    Get PDF
    Ligands, especially phosphines and carbenes, can play a key role in modifying and controlling homogeneous organometallic catalysts, and they often provide a convenient approach to fine-tuning the performance of known catalysts. The measurable outcomes of such catalyst modifications (yields, rates, selectivity) can be set into context by establishing their relationship to steric and electronic descriptors of ligand properties, and such models can guide the discovery, optimization, and design of catalysts. In this review we present a survey of calculated ligand descriptors, with a particular focus on homogeneous organometallic catalysis. A range of different approaches to calculating steric and electronic parameters are set out and compared, and we have collected descriptors for a range of representative ligand sets, including 30 monodentate phosphorus­(III) donor ligands, 23 bidentate P,P-donor ligands, and 30 carbenes, with a view to providing a useful resource for analysis to practitioners. In addition, several case studies of applications of such descriptors, covering both maps and models, have been reviewed, illustrating how descriptor-led studies of catalysis can inform experiments and highlighting good practice for model comparison and evaluation

    Gaussian Process Regression for Materials and Molecules.

    Get PDF
    We provide an introduction to Gaussian process regression (GPR) machine-learning methods in computational materials science and chemistry. The focus of the present review is on the regression of atomistic properties: in particular, on the construction of interatomic potentials, or force fields, in the Gaussian Approximation Potential (GAP) framework; beyond this, we also discuss the fitting of arbitrary scalar, vectorial, and tensorial quantities. Methodological aspects of reference data generation, representation, and regression, as well as the question of how a data-driven model may be validated, are reviewed and critically discussed. A survey of applications to a variety of research questions in chemistry and materials science illustrates the rapid growth in the field. A vision is outlined for the development of the methodology in the years to come

    Predicting NMR parameters from the molecular structure

    Get PDF

    Visible and near infrared spectroscopy in soil science

    Get PDF
    This chapter provides a review on the state of soil visible–near infrared (vis–NIR) spectroscopy. Our intention is for the review to serve as a source of up-to date information on the past and current role of vis–NIR spectroscopy in soil science. It should also provide critical discussion on issues surrounding the use of vis–NIR for soil analysis and on future directions. To this end, we describe the fundamentals of visible and infrared diffuse reflectance spectroscopy and spectroscopic multivariate calibrations. A review of the past and current role of vis–NIR spectroscopy in soil analysis is provided, focusing on important soil attributes such as soil organic matter (SOM), minerals, texture, nutrients, water, pH, and heavy metals. We then discuss the performance and generalization capacity of vis–NIR calibrations, with particular attention on sample pre-tratments, co-variations in data sets, and mathematical data preprocessing. Field analyses and strategies for the practical use of vis–NIR are considered. We conclude that the technique is useful to measure soil water and mineral composition and to derive robust calibrations for SOM and clay content. Many studies show that we also can predict properties such as pH and nutrients, although their robustness may be questioned. For future work we recommend that research should focus on: (i) moving forward with more theoretical calibrations, (ii) better understanding of the complexity of soil and the physical basis for soil reflection, and (iii) applications and the use of spectra for soil mapping and monitoring, and for making inferences about soils quality, fertility and function. To do this, research in soil spectroscopy needs to be more collaborative and strategic. The development of the Global Soil Spectral Library might be a step in the right direction

    Automatic learning for the classification of chemical reactions and in statistical thermodynamics

    Get PDF
    This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations

    Investigating summer thermal stratification in Lake Ontario

    Get PDF
    Summer thermal stratification in Lake Ontario is simulated using the 3D hydrodynamic model Environmental Fluid Dynamics Code (EFDC). Summer temperature differences establish strong vertical density gradients (thermocline) between the epilimnion and hypolimnion. Capturing the stratification and thermocline formation has been a challenge in modeling Great Lakes. Deviating from EFDC's original Mellor-Yamada (1982) vertical mixing scheme, we have implemented an unidimensional vertical model that uses different eddy diffusivity formulations above and below the thermocline (Vincon-Leite, 1991; Vincon-Leite et al., 2014). The model is forced with the hourly meteorological data from weather stations around the lake, flow data for Niagara and St. Lawrence rivers; and lake bathymetry is interpolated on a 2-km grid. The model has 20 vertical layers following sigma vertical coordinates. Sensitivity of the model to vertical layers' spacing is thoroughly investigated. The model has been calibrated for appropriate solar radiation coefficients and horizontal mixing coefficients. Overall the new implemented diffusivity algorithm shows some successes in capturing the thermal stratification with RMSE values between 2-3°C. Calibration of vertical mixing coefficients is under investigation to capture the improved thermal stratification

    New Advances in Fast Methods of 2D NMR Experiments

    Get PDF
    Although nuclear magnetic resonance spectroscopy is a potent analytical tool for identification, quantification, and structural elucidation, it suffers from inherently low sensitivity limitations. This chapter focuses on recently reported methods that enable quick acquisition of NMR spectra, as well as new methods of faster, efficient, and informative two-dimensional (2D) NMR methods. Fast and efficient data acquisition has risen in response to an increasing need to investigate chemical and biological processes in real time. Several new techniques have been successfully introduced. One example of this is band-selective optimized-flip-angle short-transient (SOFAST) NMR, which has opened the door to studying the kinetics of biological processes such as the phosphorylation of proteins. The fast recording of NMR spectra allows researchers to investigate time sensitive molecules that have limited stability under experimental conditions. The increasing awareness that molecular structures are dynamic, rather than static, has pushed some researchers to find alternatives to standard, time-consuming methods of 15N relaxation observables acquisition

    Knowledge-based prediction of chemical shift and recognition of protein native structure

    Get PDF
    We designed and implemented a suite of program which is able to accurately and automatically predict chemical shift of protein C-alpha nuclei on the simple basis of protein sequence and low-resolution C-alpha trace conformation. We applied this knowledge-based prediction approach on a group of C-alpha structures generated by computational modeling methods, and successfully identify the native structure by comparing the predicted and unassigned observed NMR data. We begin the automatic prediction with construction of a knowledge-based protein structural profile library, which aims at capturing the most significant structural features affecting chemical shifts, even from a highly coarse-grained C-alpha model. The library is populated by more than 5000 non-homologous proteins, with publicly accessible structures from Protein Data Bank and more than 1.5 million pre-calculated chemical shifts by a widely used NMR predictive program SHIFTX. Fed with the minimum sequential and structural information, the program is able predict highly consistent chemical shifts comparing with experimental observed data from an NMR spectroscopy database BioMagResBank(BMRB). Overall, the proposed program achieves a correlation coefficient of 0.937 and RMSD of 1.702 ppm towards observed chemical shifts. These results are slightly lower than those from achieved by the benchmark program SHIFTX, which utilizes semi-empirical hypersurfaces and semi-classical equations. On the same test sets, SHIFTX achieved a correlation coefficient of 0.945 and RMSD of 1.599 against experimental observations. In compensation, like most other predictive methods, SHIFTX requires high-resolution protein structures with three-dimensional all-atom coordinates, its accuracy of prediction will be highly compromised unless fed with all-atom high-resolution structure, which is normally exceedingly difficult to obtain. Combined with an optimization matching system using Monte Carlo method, we compared the predicted C-alpha chemical shifts with unassigned NMR data from BMRB, and successfully identify the native fold topology by the resemblance between two sets of chemical shifts. In summary, the proposed program is one of the only methods which are capable to predict accurate chemical shifts, even on low-resolution C-alpha protein structures, which are far more accessible and readily obtained by currently available protein modeling methods. Based on the understanding that the similar pattern of chemical shifts reflects resemblance of two structures, we approved that prediction-recognition approach not only fundamentally improve the way of the NMR-assisted computational protein modeling, but is effective in accelerating the traditional protein structure determination and validation by NMR
    • …
    corecore