462 research outputs found

    Roadmap on Machine learning in electronic structure

    Get PDF
    AbstractIn recent years, we have been witnessing a paradigm shift in computational materials science. In fact, traditional methods, mostly developed in the second half of the XXth century, are being complemented, extended, and sometimes even completely replaced by faster, simpler, and often more accurate approaches. The new approaches, that we collectively label by machine learning, have their origins in the fields of informatics and artificial intelligence, but are making rapid inroads in all other branches of science. With this in mind, this Roadmap article, consisting of multiple contributions from experts across the field, discusses the use of machine learning in materials science, and share perspectives on current and future challenges in problems as diverse as the prediction of materials properties, the construction of force-fields, the development of exchange correlation functionals for density-functional theory, the solution of the many-body problem, and more. In spite of the already numerous and exciting success stories, we are just at the beginning of a long path that will reshape materials science for the many challenges of the XXIth century

    Machine Learning static RPA response properties for accelerating GW calculations

    Get PDF
    In this thesis, I explore the possibility of constructing machine-learning models of the interacting density-density response function (DDRF) and quantities derived from it. Accurate models of the DDRF are a crucial ingredient to enabling GW quasiparticle calculations of more complex systems. Model DDRFs bypass the expensive calculation and inversion of the dielectric matrix, which is the origin of the poor scaling of the GW method with the number of atoms. The thesis is organized as follows: ‱ Chapter 2 systematically reviews common descriptors used for machine-learning physical quantities. The key ideas behind the construction of such descriptors are discussed. First, I introduce several descriptors that systematically incorporate symmetry transformations that leave the target quantity invariant. These descriptors can be used for learning quantities such as the ground-state energy, atomization energies and scalar polarizabilities. Next, I discuss several descriptors and models that are equivariant under transformations of the molecular structure. These descriptors are ideal for learning quantities which transform in a defined way under the action of a transformation, such as vectors, tensors and functions, including the DDRF. ‱ In Chapter 3, I introduce the key electronic structure methods employed throughout the thesis. I start by introducing density functional theory, followed by a detailed introduction to the GW method and the DDRF. ‱ In Chapter 4, I develop a machine-learning model of an invariant quantity derived from the random phase approximation (RPA) DDRF: the scalar polarizability. In this chapter, I calculate the DDRF of 110 hydrogenated silicon clusters. The results of these calculations are then used to train a model of the scalar polarizability based on the SOAP descriptor [16]. The resulting model is then used to predict the scalar polarizability of clusters with up to 3000 silicon atoms while converging to the correct silicon scalar polarizability bulk limit. The findings of this chapter indicate that the scalar polarizability - even though derived from the non-local DDRF - can be accurately predicted from structural descriptors that only encode the local environment of each atom. These results indicate that the response of a non-metallic system to an external potential described by the DDRF may also be approximated as a sum of localized atomic contributions, which forms the motivation for the following two chapters. ‱ In Chapter 5, I develop an approximation to the DDRF of the silicon clusters based on a projection onto atom-centred auxiliary density-fitting basis sets. The results of this chapter indicate that the plane-wave DDRF can be efficiently represented by a small localized basis, thus significantly reducing the size of the DDRF. At the end of this section, I develop a simple neural-network model of the DDRF in this localized basis, highlighting the necessity for using an equivariant descriptor and motivating the next chapter’s developments. ‱ In Chapter 6, I develop a new approximation to the DDRF, which allows a decomposition into atomic contributions. I further introduce the neighbourhood density matrix (NDM), a non-local extension of the SOAP descriptor, which transforms under rotations in the same way as the atomic contributions to the DDRF. The developed method is then applied to the silicon clusters from the previous chapters. Using the NDM, I develop a neural-network model capable of accurately predicting the atomic contributions to the DDRF. These atomic contributions are transformed into a plane-wave basis and summed to obtain the DDRF of a silicon cluster. The predicted DDRFs are then used in GW calculations, which show that the model DDRFs accurately reproduce the quasiparticle energy corrections from GW calculations, as obtained within the atomic decomposition of the DDRF. This methodology can be used to construct arbitrarily complex model DDRFs based on purely structural properties of clusters and nanoparticles, paving the way towards GW calculations of complex systems, such as disordered materials, liquids, interfaces and nanoparticles.Open Acces

    Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

    Full text link
    By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal organic frameworks (MOFs). At present, we have libraries of over ten thousand synthesized materials and millions of in-silico predicted materials. The fact that we have so many materials opens many exciting avenues to tailor make a material that is optimal for a given application. However, from an experimental and computational point of view we simply have too many materials to screen using brute-force techniques. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We emphasize the importance of data collection, methods to augment small data sets, how to select appropriate training sets. An important part of this review are the different approaches that are used to represent these materials in feature space. The review also includes a general overview of the different ML techniques, but as most applications in porous materials use supervised ML our review is focused on the different approaches for supervised ML. In particular, we review the different method to optimize the ML process and how to quantify the performance of the different methods. In the second part, we review how the different approaches of ML have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. The range of topics illustrates the large variety of topics that can be studied with big-data science. Given the increasing interest of the scientific community in ML, we expect this list to rapidly expand in the coming years.Comment: Editorial changes (typos fixed, minor adjustments to figures

    Machine learning activation energies of chemical reactions

    Get PDF
    Application of machine learning (ML) to the prediction of reaction activation barriers is a new and exciting field for these algorithms. The works covered here are specifically those in which ML is trained to predict the activation energies of homogeneous chemical reactions, where the activation energy is given by the energy difference between the reactants and transition state of a reaction. Particular attention is paid to works that have applied ML to directly predict reaction activation energies, the limitations that may be found in these studies, and where comparisons of different types of chemical features for ML models have been made. Also explored are models that have been able to obtain high predictive accuracies, but with reduced datasets, using the Gaussian process regression ML model. In these studies, the chemical reactions for which activation barriers are modeled include those involving small organic molecules, aromatic rings, and organometallic catalysts. Also provided are brief explanations of some of the most popular types of ML models used in chemistry, as a beginner's guide for those unfamiliar

    Gaussian Process Regression for Materials and Molecules.

    Get PDF
    We provide an introduction to Gaussian process regression (GPR) machine-learning methods in computational materials science and chemistry. The focus of the present review is on the regression of atomistic properties: in particular, on the construction of interatomic potentials, or force fields, in the Gaussian Approximation Potential (GAP) framework; beyond this, we also discuss the fitting of arbitrary scalar, vectorial, and tensorial quantities. Methodological aspects of reference data generation, representation, and regression, as well as the question of how a data-driven model may be validated, are reviewed and critically discussed. A survey of applications to a variety of research questions in chemistry and materials science illustrates the rapid growth in the field. A vision is outlined for the development of the methodology in the years to come

    Environment matters : the impact of urea and macromolecular crowding on proteins

    Get PDF
    [eng] This work aims to analytically understand the impact of two diametric opposite environments on protein structure and dynamics and compared them to the most common solvent on earth: water. The first environment is a popular denaturing solution (urea 8M), which has served for years in protein-science laboratories to investigate protein stability; still many open questions regarding its mechanism of action remained unclear. The second environment instead moves towards a more physiological representation of proteins. The cell interior, in fact, is a crowded solution highly populated prevalently by proteins, but studies on protein structure and dynamics have lead so far to confusing or even opposite observations. The lack of a consensus view in both phenomena possibly derives from the bias of the system under study. This work is an attempt of a comparative study using the most general systems: a diverse spectrum of proteins folds, different stages along the reaction path (early stages or end-point) and/or different protein force-fields. Our main objective was to derive common pattern and general rules valid at proteome level, focusing on three major aspects of proteins: the structure, the dynamic and the interactions with the solvent molecules. Molecular dynamics simulation appeared then as the most suitable tool because of its ability to i) analyze proteins at broad range of resolutions; ii) access the direct time-resolved dynamic of the system and iii) dissect the specific interactions that arise in the new settings. Specifically, the case of urea-induced unfolding needs a system for which is possible to clearly identify folded and unfolded state – globular proteins are then the most suitable ones. We extracted general rules on the folded/unfolded transition by studying independently the two end-points of folded/unfolded reaction. We simulated the urea-induced unfolded state of a model protein, ubiquitin to understand the energetics stabilizing unfolded structures in urea. We found that the unfolded ubiquitin in 8M urea is fully extend and flexible and capturing efficiently urea molecules to the first solvation shell. Dispersion, rather than electrostatic, appear the main energetic contribution to explain the stabilization of the unfolded state. We then simulated the early stages of urea-induced unfolding on a large dataset of folded proteins, which represent the major folds of globular proteins, aiming also to investigate the kinetic role of urea in triggering the protein unfolding. We found that partially unfolded proteins expose the apolar residues buried in the protein interior, mainly via cavitation. Similar to the unfolded state, it is the dispersion interactions that drive urea accumulation in the solvation shell but here urea molecules take advantage of microscopic unfolding events to penetrate the protein interior. Macromolecular crowding instead is a phenomenon that universally affects all the proteins. We simulated a system that included as crowding agents proteins with different conformational landscapes (a globular protein, an intrinsically disordered proteins and a molten globule) arranged to reach cell-like concentrations. We conclude that the universal effect of crowding, valid for all the proteins types, is exerted via the aspecific interactions and favors open and moderately extended conformations with higher secondary structure content. This phenomenon counterbalances the volume-exclusion, which prevails at higher crowding concentrations. The impact of crowding is proportional to the degree of disorder of the protein and for folded protein crowding favors structural rearrangements while unfolded structures experience a stronger stabilization and a higher secondary structures content. The synthetic crowder PEG doesn’t reproduce any of these effects, arising concerns about its employment in study cell-like environments

    Roadmap on machine learning in electronic structure

    Get PDF
    In recent years, we have been witnessing a paradigm shift in computational materials science. In fact, traditional methods, mostly developed in the second half of the XXth century, are being complemented, extended, and sometimes even completely replaced by faster, simpler, and often more accurate approaches. The new approaches, that we collectively label by machine learning, have their origins in the fields of informatics and artificial intelligence, but are making rapid inroads in all other branches of science. With this in mind, this Roadmap article, consisting of multiple contributions from experts across the field, discusses the use of machine learning in materials science, and share perspectives on current and future challenges in problems as diverse as the prediction of materials properties, the construction of force-fields, the development of exchange correlation functionals for density-functional theory, the solution of the many-body problem, and more. In spite of the already numerous and exciting success stories, we are just at the beginning of a long path that will reshape materials science for the many challenges of the XXIth century
    • 

    corecore