Search CORE

8 research outputs found

Recommended from our members

Data Mining Chemistry and Crystal Structure

Author: Yang Lusann Wren
Publication venue: 'Harvard University Botany Libraries'
Publication date: 06/06/2014
Field of study

The availability of large amounts of data generated by high-throughput computing and experimentation has generated interest in the application of machine learning techniques to materials science. Machine learning of materials behavior requires the use of feature vectors that capture compositional or structural information influence a target property. We present methods for assessing the similarity of compositions, substructures, and crystal structures. Similarity measures are important for the classification and clustering of data points, allowing for the organization of data and the prediction of materials properties.Engineering and Applied Science

Harvard University - DASH

Proposed definition of crystal substructure and substructural similarity

Author: Ceder Gerbrand
Dacek Stephen Thomas
Yang Lusann
Publication venue: 'American Physical Society (APS)'
Publication date: 05/08/2014
Field of study

There is a clear need for a practical and mathematically rigorous description of local structure in inorganic compounds so that structures and chemistries can be easily compared across large data sets. Here a method for decomposing crystal structures into substructures is given, and a similarity function between those substructures is defined. The similarity function is based on both geometric and chemical similarity. This construction allows for large-scale data mining of substructural properties, and the analysis of substructures and void spaces within crystal structures. The method is validated via the prediction of Li-ion intercalation sites for the oxides. Tested on databases of known Li-ion-containing oxides, the method reproduces all Li-ion sites in an oxide with a maximum of 4 incorrect guesses 80% of the time.National Science Foundation (U.S.) (SI2-SSI Collaborative Research program Award OCI-1147503)United States. Dept. of Energy. Office of Basic Energy Sciences (Grant EDCBEE

DSpace@MIT

Crossref

Crystal Structure Search with Random Relaxations Using Graph Networks

Author: Cheon Gowoon
Cubuk Ekin D.
McCloskey Kevin
Reed Evan J.
Yang Lusann
Publication venue
Publication date: 07/12/2020
Field of study

Materials design enables technologies critical to humanity, including combating climate change with solar cells and batteries. Many properties of a material are determined by its atomic crystal structure. However, prediction of the atomic crystal structure for a given material's chemical formula is a long-standing grand challenge that remains a barrier in materials design. We investigate a data-driven approach to accelerating ab initio random structure search (AIRSS), a state-of-the-art method for crystal structure search. We build a novel dataset of random structure relaxations of Li-Si battery anode materials using high-throughput density functional theory calculations. We train graph neural networks to simulate relaxations of random structures. Our model is able to find an experimentally verified structure of Li15Si4 it was not trained on, and has potential for orders of magnitude speedup over AIRSS when searching large unit cells and searching over multiple chemical stoichiometries. Surprisingly, we find that data augmentation of adding Gaussian noise improves both the accuracy and out of domain generalization of our models.Comment: Removed citations from the abstract, paper content is unchange

arXiv.org e-Print Archive

Switching Stability: An Examination of the Control Algorithm Used by Team Caltech in the DARPA Grand Challenge 2005

Author: Yang Lusann Wren
Publication venue
Publication date: 01/01/2006
Field of study

Dataspace

Data-mined similarity function between material compositions

Author: Ceder Gerbrand
Yang Lusann
Publication venue: 'American Physical Society (APS)'
Publication date: 01/11/2013
Field of study

A new method for assessing the similarity of material compositions is described. A similarity measure is important for the classification and clustering of compositions. The similarity of the material compositions is calculated utilizing a data-mined ionic substitutional similarity based upon the probability with which two ions will substitute for each other within the same structure prototype. The method is validated via the prediction of crystal structure prototypes for oxides from the Inorganic Crystal Structure Database, selecting the correct prototype from a list of known prototypes within five guesses 75% of the time. It performs particularly well on the quaternary oxides, selecting the correct prototype from a list of known prototypes on the first guess 65% of the time.United States. Dept. of Energy (Contract DE-FG02-96ER45571)United States. Office of Naval Research (Contract N00014-11-1-0212)National Science Foundation (U.S.) (Cyber-enabled Discover and Innovation Contract ECCS-0941043

DSpace@MIT

Crossref

Proposed definition of crystal substructure and substructural similarity

Author: G. Bergerhoff
G. Voronoi
Gerbrand Ceder
Lusann Yang
R. de Gelder
Stephen Dacek
T. Hastie
Publication venue: 'American Physical Society (APS)'
Publication date
Field of study

Crossref

Discovery of complex oxides via automated experiments and data science

Author: Armstrong Zan
Berndl Marc
Coram Marc
Gregoire John M.
Haber Joel
Kan Kevin
Richter Matthias H.
Riley Patrick
Roat Christopher
Wagner Nicholas
Yang Lusann
Yang Samuel J.
Zhou Lan
Publication venue: CaltechDATA
Publication date: 10/09/2021
Field of study

This dataset is licensed under the Creative Commons Attribution 4.0 license(CC-BY-4.0). See https://creativecommons.org/licenses/by/4.0/for more information.   If using this dataset, please cite https://doi.org/10.1073/pnas.2106042118   We've released data from 6 print sessions, comprising 173 plates, 131 quaternary oxide systems, 6,918,024  individual composition samples, and 376,752 distinct compositions. While the tenfold reproductions within each plate are well controlled, uncontrolled variables (printhead age, etc) may lead to poorer consistency between print sessions.   The data exists in four directories and one metadata file. Each directory contains one type of data, with one *.csv file per printed plate.   i. The data in ten_replicas/ consists of optical transmission data, with one row per printed patch on a plate. The column headers are: ExpID: an integer experiment ID for the printed patch on the plate. row, col: The row and the column coordinates of the printed patch in the microscope image signal_#: The measurement of ɑ, the optical transmission spectrum of the printed patch, at a given wavelength. # ranges from 0 to 8, inclusive, indicating transmission spectra at the following wavelengths: 375, 395, 455, 530, 590, 617, 660, 735, & 850 nm. plate: The integer plate identifier. line: An integer identifier of the composition gradient that was printed. line_experiment_id: An integer identifier of the composition sample along the composition gradient. replica: An integer identifier of the replica # of the printed line. metal: Each plate will have up to six metal column headers, where the possible metals include: ['Ce', 'Co', 'Cu', 'Fe', 'In', 'Mg', 'Ni', 'Sn', 'Ta', 'Y']. The metal columns sum to 1, indicating the ratios of metals printed.   ii. The data in aggregated_replicas/ consists of optical transmission data, with one row per tenfold aggregated patch on a plate. The column headers are: signal_#: The measurement of ɑ, the optical transmission spectrum of the printed patch, at a given wavelength. # ranges from 0 to 8, inclusive, indicating transmission spectra at the following wavelengths: 375, 395, 455, 530, 590, 617, 660, 735, & 850 nm. plate: The integer plate identifier. line: An integer identifier of the composition gradient that was printed. line_experiment_id: An integer identifier of the composition sample along the composition gradient. metal: Each plate will have up to six metal column headers, where the possible metals include: ['Ce', 'Co', 'Cu', 'Fe', 'In', 'Mg', 'Ni', 'Sn', 'Ta', 'Y']. The metal columns sum to 1, indicating the ratios of metals printed.   iii. The data in mixture/ represents the outcome of a probabilistic model that a given composition can be explained by a mixture of at most 3 binary signals. There is one row per composition. The column headers are: log_prob: The log of the probability that this composition is explainable by at most 3 binary signals. metal: Each plate will have up to six metal column headers, where the possible metals include: ['Ce', 'Co', 'Cu', 'Fe', 'In', 'Mg', 'Ni', 'Sn', 'Ta', 'Y']. The metal columns sum to 1, indicating the ratios of metals in the composition.   iv. The data in phase_fits/ represents the outcome of a phase fitting model. There is one row per phase diagram. This data is meant to be read using the example colab. The column headers are: residual: Float, the residual of the phase fit. signal_type: This is either 'signal' or 'sigma', indicating the type of the phase fit (see paper). discretization: The integer number of intervals we discretized the phase space into. n_points: The number of internal points in the phase diagram. This is an integer between 1 and 5, inclusive. metal_0, metal_1, metal_2: Three strings identifying the constituent metals of the phase diagram. point_#_pos_0, point_#_pos_y: The coordinates of a phase point. # ranges between 0 and 7, inclusive. point_#_pos_0 gives the float amount of metal_0, and point_#_pos_1 gives the float amount of metal_1. The float amount of metal_2 can be inferred via 1 - (point_#_pos_0 + point_#_pos_1). point_#_fitted_channel_X: The fitted optical absorption spectra of point_#. # is an integer between 0 and 7, inclusive. X is an integer between 0 and 8, inclusive, indicating the wavelength of the light absorbed.   The files are publicly available for access via: - the gsutil CLI tool at https://cloud.google.com/storage/docs/gsutil - the tf.io.gfile APIs at https://www.tensorflow.org/api_docs/python/tf/io/gfile/GFile - HTTP API: http://storage.googleapis.com/gresearch/metal-oxide-spectroscopy/path/to/file   This file, the README, is available at: http://storage.googleapis.com/gresearch/metal-oxide-spectroscopy/README.txt   The metadata file is available at: http://storage.googleapis.com/gresearch/metal-oxide-spectroscopy/metadata.csv, which lists all the plates available for download.   The plate data for each of the four data types listed above can be found at: http://storage.googleapis.com/gresearch/metal-oxide-spectroscopy/data_type_subdir/plate.cs

PubMed Central

CaltechDATA (California Institute of Technology Research Data Repository)