193,404 research outputs found
Analyzing machine learning models to accelerate generation of fundamental materials insights
Machine learning for materials science envisions the acceleration of basic science research through automated identification of key data relationships to augment human interpretation and gain scientific understanding. A primary role of scientists is extraction of fundamental knowledge from data, and we demonstrate that this extraction can be accelerated using neural networks via analysis of the trained data model itself rather than its application as a prediction tool. Convolutional neural networks excel at modeling complex data relationships in multi-dimensional parameter spaces, such as that mapped by a combinatorial materials science experiment. Measuring a performance metric in a given materials space provides direct information about (locally) optimal materials but not the underlying materials science that gives rise to the variation in performance. By building a model that predicts performance (in this case photoelectrochemical power generation of a solar fuels photoanode) from materials parameters (in this case composition and Raman signal), subsequent analysis of gradients in the trained model reveals key data relationships that are not readily identified by human inspection or traditional statistical analyses. Human interpretation of these key relationships produces the desired fundamental understanding, demonstrating a framework in which machine learning accelerates data interpretation by leveraging the expertize of the human scientist. We also demonstrate the use of neural network gradient analysis to automate prediction of the directions in parameter space, such as the addition of specific alloying elements, that may increase performance by moving beyond the confines of existing data
Graph neural networks for materials science and chemistry
Machine learning plays an increasingly important role in many areas of chemistry and materials science, being used to predict materials properties, accelerate simulations, design new structures, and predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this Review, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs
Optimized Crystallographic Graph Generation for Material Science
Graph neural networks are widely used in machine learning applied to
chemistry, and in particular for material science discovery. For crystalline
materials, however, generating graph-based representation from geometrical
information for neural networks is not a trivial task. The periodicity of
crystalline needs efficient implementations to be processed in real-time under
a massively parallel environment. With the aim of training graph-based
generative models of new material discovery, we propose an efficient tool to
generate cutoff graphs and k-nearest-neighbours graphs of periodic structures
within GPU optimization. We provide pyMatGraph a Pytorch-compatible framework
to generate graphs in real-time during the training of neural network
architecture. Our tool can update a graph of a structure, making generative
models able to update the geometry and process the updated graph during the
forward propagation on the GPU side. Our code is publicly available at
https://github.com/aklipf/mat-graph
Putting Chemical Knowledge to Work in Machine Learning for Reactivity
Machine learning has been used to study chemical reactivity for a long time in fields such as physical organic chemistry, chemometrics and cheminformatics. Recent advances in computer science have resulted in deep neural networks that can learn directly from the molecular structure. Neural networks are a good choice when large amounts of data are available. However, many datasets in chemistry are small, and models utilizing chemical knowledge are required for good performance. Adding chemical knowledge can be achieved either by adding more information about the molecules or by adjusting the model architecture itself. The current method of choice for adding more information is descriptors based on computed quantum-chemical properties. Exciting new research directions show that it is possible to augment deep learning with such descriptors for better performance in the low-data regime. To modify the models, differentiable programming enables seamless merging of neural networks with mathematical models from chemistry and physics. The resulting methods are also more data-efficient and make better predictions for molecules that are different from the initial dataset on which they were trained. Application of these chemistry-informed machine learning methods promise to accelerate research in fields such as drug design, materials design, catalysis and reactivity
DiSCoMaT: Distantly Supervised Composition Extraction from Tables in Materials Science Articles
A crucial component in the curation of KB for a scientific domain (e.g.,
materials science, foods & nutrition, fuels) is information extraction from
tables in the domain's published research articles. To facilitate research in
this direction, we define a novel NLP task of extracting compositions of
materials (e.g., glasses) from tables in materials science papers. The task
involves solving several challenges in concert, such as tables that mention
compositions have highly varying structures; text in captions and full paper
needs to be incorporated along with data in tables; and regular languages for
numbers, chemical compounds and composition expressions must be integrated into
the model. We release a training dataset comprising 4,408 distantly supervised
tables, along with 1,475 manually annotated dev and test tables. We also
present a strong baseline DISCOMAT, that combines multiple graph neural
networks with several task-specific regular expressions, features, and
constraints. We show that DISCOMAT outperforms recent table processing
architectures by significant margins.Comment: Accepted long paper at ACL 2023
(https://2023.aclweb.org/program/accepted_main_conference/
Material Informatics through Neural Networks on Ab-Initio Electron Charge Densities: the Role of Transfer Learning
In this work, the dynamic realms of Materials Science and Computer Science
advancements meet the critical challenge of identifying efficient descriptors
capable of capturing the essential features of physical systems. Such task has
remained formidable, with solutions often involving ad-hoc scalar and vectorial
sets of materials properties, making optimization and transferability
challenging. We extract representations directly from ab-initio differential
electron charge density profiles using Neural Networks, highlighting the
pivotal role of transfer learning in such task. Firstly, we demonstrate
significant improvements in regression of a specific defected-materials
property with respect to training a deep network from scratch, both in terms of
predictions and their reproducibilities, by considering various pre-trained
models and selecting the optimal one after fine-tuning. The remarkable
performances obtained confirmed the transferability of the existent pre-trained
Convolutional Neural Networks (CNNs) on physics domain data, very different
from the original training data. Secondly, we demonstrate a saturation in the
regression capabilities of computer vision models towards properties of an
extensive variety of undefected systems, and how it can be overcome with the
help of large language model (LLM) transformers, with as little text
information as composition names. Finally, we prove the insufficiency of
open-models, like GPT-4, in achieving the analogous tasks and performances as
the proposed domain-specific ones. The work offers a promising avenue for
enhancing the effectiveness of descriptor identification in complex physical
systems, shedding light over the power of transfer learning to easily adapt and
combine available models, with different modalities, to the physics domain, at
the same time opening space to a benchmark for LLMs capabilities in such
domain
Quality by design approach for tablet formulations containing spray coated ramipril by using artificial intelligence techniques
Different software programs based on mathematical models have been developed to aid the product development process. Recent developments in mathematics and computer science have resulted in new programs based on artificial neural networks (ANN) techniques. These programs have been used to develop and formulate pharmaceutical products. In this study, intelligent software was used to predict the relationship between the materials that were used in tablet formulation and the tablet specifications and to determine highly detailed information about the interactions between the formulation parameters and the specifications. The input data were generated from historical data and the results obtained from analyzing tablets produced by different formulations. The relative significance of inputs on various outputs such as assay, dissolution in 30 min and crushing strengths, was investigated using the artificial neural networks (ANNs), neurofuzzy logic and genetic programming (FormRules, INForm ANN and GEP).This study indicated that ANN and GEP can be used effectively for optimizing formulations and that GEP can be evaluated statistically because of the openness of its equations. Additionally, FormRules was very helpful for teasing out the relationships between the inputs (formulation variables) and the outputs
Quality by design approach for tablet formulations containing spray coated ramipril by using artificial intelligence techniques
Different software programs based on mathematical models have been developed to aid the product development process. Recent developments in mathematics and computer science have resulted in new programs based on artificial neural networks (ANN) techniques. These programs have been used to develop and formulate pharmaceutical products. In this study, intelligent software was used to predict the relationship between the materials that were used in tablet formulation and the tablet specifications and to determine highly detailed information about the interactions between the formulation parameters and the specifications. The input data were generated from historical data and the results obtained from analyzing tablets produced by different formulations. The relative significance of inputs on various outputs such as assay, dissolution in 30 min and crushing strengths, was investigated using the artificial neural networks (ANNs), neurofuzzy logic and genetic programming (FormRules, INForm ANN and GEP).This study indicated that ANN and GEP can be used effectively for optimizing formulations and that GEP can be evaluated statistically because of the openness of its equations. Additionally, FormRules was very helpful for teasing out the relationships between the inputs (formulation variables) and the outputs
- …