High-throughput data generation methods and machine learning (ML) algorithms
have given rise to a new era of computational materials science by learning
relationships among composition, structure, and properties and by exploiting
such relations for design. However, to build these connections, materials data
must be translated into a numerical form, called a representation, that can be
processed by a machine learning model. Datasets in materials science vary in
format (ranging from images to spectra), size, and fidelity. Predictive models
vary in scope and property of interests. Here, we review context-dependent
strategies for constructing representations that enable the use of materials as
inputs or outputs of machine learning models. Furthermore, we discuss how
modern ML techniques can learn representations from data and transfer chemical
and physical information between tasks. Finally, we outline high-impact
questions that have not been fully resolved and thus, require further
investigation.Comment: 20 pages, 5 figures, To Appear in Annual Review of Materials Research
5