11 research outputs found

    Learning Rules for Materials Properties and Functions

    Get PDF
    In materials science and engineering, one is typically searching for materials that exhibit exceptional performance for a certain function, and the number of these materials is extremely small. Thus, statistically speaking, we are interested in the identification of *rare phenomena*, and the scientific discovery typically resembles the proverbial hunt for the needle in a haystack

    Subjectively Interesting Subgroup Discovery on Real-valued Targets

    Get PDF
    Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely many if we consider weighted combinations, even for linear combinations. Hence, an obvious question is whether we can automate the search for interesting patterns and visualizations. In this paper, we consider the setting where a user wants to learn as efficiently as possible about real-valued attributes. For example, to understand the distribution of crime rates in different geographic areas in terms of other (numerical, ordinal and/or categorical) variables that describe the areas. We introduce a method to find subgroups in the data that are maximally informative (in the formal Information Theoretic sense) with respect to a single or set of real-valued target attributes. The subgroup descriptions are in terms of a succinct set of arbitrarily-typed other attributes. The approach is based on the Subjective Interestingness framework FORSIED to enable the use of prior knowledge when finding most informative non-redundant patterns, and hence the method also supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio

    Identifying outstanding transition-metal-alloy heterogeneous catalysts for the oxygen reduction and evolution reactions via subgroup discovery

    Get PDF
    In order to estimate the reactivity of a large number of potentially complex heterogeneous catalysts while searching for novel and more efficient materials, physical as well as data-centric models have been developed for a faster evaluation of adsorption energies compared to first-principles calculations. However, global models designed to describe as many materials as possible might overlook the very few compounds that have the appropriate adsorption properties to be suitable for a given catalytic process. Here, the subgroup-discovery (SGD) local artificial-intelligence approach is used to identify the key descriptive parameters and constrains on their values, the so-called SG rules, which particularly describe transition-metal surfaces with outstanding adsorption properties for the oxygen reduction and evolution reactions. We start from a data set of 95 oxygen adsorption energy values evaluated by density-functional-theory calculations for several monometallic surfaces along with 16 atomic, bulk and surface properties as candidate descriptive parameters. From this data set, SGD identifies constraints on the most relevant parameters describing materials and adsorption sites that (i) result in O adsorption energies within the Sabatier-optimal range required for the oxygen reduction reaction and (ii) present the largest deviations from the linear scaling relations between O and OH adsorption energies, which limit the performance in the oxygen evolution reaction. The SG rules not only reflect the local underlying physicochemical phenomena that result in the desired adsorption properties but also guide the challenging design of alloy catalysts

    Ab initio data-analytics study of carbon-dioxide activation on semiconductor oxide surfaces

    Get PDF
    The excessive emissions of carbon dioxide (CO2) into the atmosphere threaten to shift the CO2 cycle planet-wide and induce unpredictable climate changes. Using artificial intelligence (AI) trained on high-throughput first principles based data for a broad family of oxides, we develop a strategy for a rational design of catalytic materials for converting CO2 to fuels and other useful chemicals. We demonstrate that an electron transfer to the π-antibonding orbital of the adsorbed molecule and the associated bending of the initially linear molecule, previously proposed as the indicator of activation, are insufficient to account for the good catalytic performance of experimentally characterized oxide surfaces. Instead, our AI model identifies the common feature of these surfaces in the binding of a molecular O atom to a surface cation, which results in a strong elongation and therefore weakening of one molecular C-O bond. This finding suggests using the C-O bond elongation as an indicator of CO2 activation. Based on these findings, we propose a set of new promising oxide-based catalysts for CO2 conversion, and a recipe to find more

    Ab initio data-analytics study of carbon-dioxide activation on semiconductor oxide surfaces

    Get PDF
    The excessive emissions of carbon dioxide (CO2_2) into the atmosphere threaten to shift the CO2_2 cycle planet-wide and induce unpredictable climate changes. Using artificial intelligence (AI) trained on high-throughput first principles based data for a broad family of oxides, we develop a strategy for a rational design of catalytic materials for converting CO2_2 to fuels and other useful chemicals. We demonstrate that an electron transfer to the π\pi^*-antibonding orbital of the adsorbed molecule and the associated bending of the initially linear molecule, previously proposed as the indicator of activation, are insufficient to account for the good catalytic performance of experimentally characterized oxide surfaces. Instead, our AI model identifies the common feature of these surfaces in the binding of a molecular O atom to a surface cation, which results in a strong elongation and therefore weakening of one molecular C-O bond. This finding suggests using the C-O bond elongation as an indicator of CO2_2 activation. Based on these findings, we propose a set of new promising oxide-based catalysts for CO2_2 conversion, and a recipe to find more

    Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups

    Get PDF
    Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-kk subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time

    Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery

    Get PDF
    Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the average absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors.Comment: significance of empirical results tested; additional illustrations; table of used notation

    Learning subjectively interesting data representations

    Get PDF

    Roadmap on machine learning in electronic structure

    Get PDF
    In recent years, we have been witnessing a paradigm shift in computational materials science. In fact, traditional methods, mostly developed in the second half of the XXth century, are being complemented, extended, and sometimes even completely replaced by faster, simpler, and often more accurate approaches. The new approaches, that we collectively label by machine learning, have their origins in the fields of informatics and artificial intelligence, but are making rapid inroads in all other branches of science. With this in mind, this Roadmap article, consisting of multiple contributions from experts across the field, discusses the use of machine learning in materials science, and share perspectives on current and future challenges in problems as diverse as the prediction of materials properties, the construction of force-fields, the development of exchange correlation functionals for density-functional theory, the solution of the many-body problem, and more. In spite of the already numerous and exciting success stories, we are just at the beginning of a long path that will reshape materials science for the many challenges of the XXIth century