11 research outputs found
Learning Rules for Materials Properties and Functions
In materials science and engineering, one is typically searching for materials that exhibit exceptional performance for a certain function, and the number of these materials is extremely small. Thus, statistically speaking, we are interested in the identification of *rare phenomena*, and the scientific discovery typically resembles the proverbial hunt for the needle in a haystack
Subjectively Interesting Subgroup Discovery on Real-valued Targets
Deriving insights from high-dimensional data is one of the core problems in
data mining. The difficulty mainly stems from the fact that there are
exponentially many variable combinations to potentially consider, and there are
infinitely many if we consider weighted combinations, even for linear
combinations. Hence, an obvious question is whether we can automate the search
for interesting patterns and visualizations. In this paper, we consider the
setting where a user wants to learn as efficiently as possible about
real-valued attributes. For example, to understand the distribution of crime
rates in different geographic areas in terms of other (numerical, ordinal
and/or categorical) variables that describe the areas. We introduce a method to
find subgroups in the data that are maximally informative (in the formal
Information Theoretic sense) with respect to a single or set of real-valued
target attributes. The subgroup descriptions are in terms of a succinct set of
arbitrarily-typed other attributes. The approach is based on the Subjective
Interestingness framework FORSIED to enable the use of prior knowledge when
finding most informative non-redundant patterns, and hence the method also
supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio
Identifying outstanding transition-metal-alloy heterogeneous catalysts for the oxygen reduction and evolution reactions via subgroup discovery
In order to estimate the reactivity of a large number of potentially complex
heterogeneous catalysts while searching for novel and more efficient materials,
physical as well as data-centric models have been developed for a faster
evaluation of adsorption energies compared to first-principles calculations.
However, global models designed to describe as many materials as possible might
overlook the very few compounds that have the appropriate adsorption properties
to be suitable for a given catalytic process. Here, the subgroup-discovery
(SGD) local artificial-intelligence approach is used to identify the key
descriptive parameters and constrains on their values, the so-called SG rules,
which particularly describe transition-metal surfaces with outstanding
adsorption properties for the oxygen reduction and evolution reactions. We
start from a data set of 95 oxygen adsorption energy values evaluated by
density-functional-theory calculations for several monometallic surfaces along
with 16 atomic, bulk and surface properties as candidate descriptive
parameters. From this data set, SGD identifies constraints on the most relevant
parameters describing materials and adsorption sites that (i) result in O
adsorption energies within the Sabatier-optimal range required for the oxygen
reduction reaction and (ii) present the largest deviations from the linear
scaling relations between O and OH adsorption energies, which limit the
performance in the oxygen evolution reaction. The SG rules not only reflect the
local underlying physicochemical phenomena that result in the desired
adsorption properties but also guide the challenging design of alloy catalysts
Ab initio data-analytics study of carbon-dioxide activation on semiconductor oxide surfaces
The excessive emissions of carbon dioxide (CO2) into the atmosphere threaten to shift the CO2 cycle planet-wide and induce unpredictable climate changes. Using artificial intelligence (AI) trained on high-throughput first principles based data for a broad family of oxides, we develop a strategy for a rational design of catalytic materials for converting CO2 to fuels and other useful chemicals. We demonstrate that an electron transfer to the π-antibonding orbital of the adsorbed molecule and the associated bending of the initially linear molecule, previously proposed as the indicator of activation, are insufficient to account for the good catalytic performance of experimentally characterized oxide surfaces. Instead, our AI model identifies the common feature of these surfaces in the binding of a molecular O atom to a surface cation, which results in a strong elongation and therefore weakening of one molecular C-O bond. This finding suggests using the C-O bond elongation as an indicator of CO2 activation. Based on these findings, we propose a set of new promising oxide-based catalysts for CO2 conversion, and a recipe to find more
Ab initio data-analytics study of carbon-dioxide activation on semiconductor oxide surfaces
The excessive emissions of carbon dioxide (CO) into the atmosphere
threaten to shift the CO cycle planet-wide and induce unpredictable climate
changes. Using artificial intelligence (AI) trained on high-throughput first
principles based data for a broad family of oxides, we develop a strategy for a
rational design of catalytic materials for converting CO to fuels and other
useful chemicals. We demonstrate that an electron transfer to the
-antibonding orbital of the adsorbed molecule and the associated bending
of the initially linear molecule, previously proposed as the indicator of
activation, are insufficient to account for the good catalytic performance of
experimentally characterized oxide surfaces. Instead, our AI model identifies
the common feature of these surfaces in the binding of a molecular O atom to a
surface cation, which results in a strong elongation and therefore weakening of
one molecular C-O bond. This finding suggests using the C-O bond elongation as
an indicator of CO activation. Based on these findings, we propose a set of
new promising oxide-based catalysts for CO conversion, and a recipe to find
more
Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups
Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top- subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time
Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery
Existing algorithms for subgroup discovery with numerical targets do not
optimize the error or target variable dispersion of the groups they find. This
often leads to unreliable or inconsistent statements about the data, rendering
practical applications, especially in scientific domains, futile. Therefore, we
here extend the optimistic estimator framework for optimal subgroup discovery
to a new class of objective functions: we show how tight estimators can be
computed efficiently for all functions that are determined by subgroup size
(non-decreasing dependence), the subgroup median value, and a dispersion
measure around the median (non-increasing dependence). In the important special
case when dispersion is measured using the average absolute deviation from the
median, this novel approach yields a linear time algorithm. Empirical
evaluation on a wide range of datasets shows that, when used within
branch-and-bound search, this approach is highly efficient and indeed discovers
subgroups with much smaller errors.Comment: significance of empirical results tested; additional illustrations;
table of used notation
Roadmap on machine learning in electronic structure
In recent years, we have been witnessing a paradigm shift in computational materials science. In fact, traditional methods, mostly developed in the second half of the XXth century, are being complemented, extended, and sometimes even completely replaced by faster, simpler, and often more accurate approaches. The new approaches, that we collectively label by machine learning, have their origins in the fields of informatics and artificial intelligence, but are making rapid inroads in all other branches of science. With this in mind, this Roadmap article, consisting of multiple contributions from experts across the field, discusses the use of machine learning in materials science, and share perspectives on current and future challenges in problems as diverse as the prediction of materials properties, the construction of force-fields, the development of exchange correlation functionals for density-functional theory, the solution of the many-body problem, and more. In spite of the already numerous and exciting success stories, we are just at the beginning of a long path that will reshape materials science for the many challenges of the XXIth century