6,621 research outputs found
Big-Data-Driven Materials Science and its FAIR Data Infrastructure
This chapter addresses the forth paradigm of materials research -- big-data
driven materials science. Its concepts and state-of-the-art are described, and
its challenges and chances are discussed. For furthering the field, Open Data
and an all-embracing sharing, an efficient data infrastructure, and the rich
ecosystem of computer codes used in the community are of critical importance.
For shaping this forth paradigm and contributing to the development or
discovery of improved and novel materials, data must be what is now called FAIR
-- Findable, Accessible, Interoperable and Re-purposable/Re-usable. This sets
the stage for advances of methods from artificial intelligence that operate on
large data sets to find trends and patterns that cannot be obtained from
individual calculations and not even directly from high-throughput studies.
Recent progress is reviewed and demonstrated, and the chapter is concluded by a
forward-looking perspective, addressing important not yet solved challenges.Comment: submitted to the Handbook of Materials Modeling (eds. S. Yip and W.
Andreoni), Springer 2018/201
Exceptional spatio-temporal behavior mining through Bayesian non-parametric modeling
Collective social media provides a vast amount of geo-tagged social posts, which contain various records on spatio-temporal behavior. Modeling spatio-temporal behavior on collective social media is an important task for applications like tourism recommendation, location prediction and urban planning. Properly accomplishing this task requires a model that allows for diverse behavioral patterns on each of the three aspects: spatial location, time, and text. In this paper, we address the following question: how to find representative subgroups of social posts, for which the spatio-temporal behavioral patterns are substantially different from the behavioral patterns in the whole dataset? Selection and evaluation are the two challenging problems for finding the exceptional subgroups. To address these problems, we propose BNPM: a Bayesian non-parametric model, to model spatio-temporal behavior and infer the exceptionality of social posts in subgroups. By training BNPM on a large amount of randomly sampled subgroups, we can get the global distribution of behavioral patterns. For each given subgroup of social posts, its posterior distribution can be inferred by BNPM. By comparing the posterior distribution with the global distribution, we can quantify the exceptionality of each given subgroup. The exceptionality scores are used to guide the search process within the exceptional model mining framework to automatically discover the exceptional subgroups. Various experiments are conducted to evaluate the effectiveness and efficiency of our method. On four real-world datasets our method discovers subgroups coinciding with events, subgroups distinguishing professionals from tourists, and subgroups whose consistent exceptionality can only be truly appreciated by combining exceptional spatio-temporal and exceptional textual behavior
Ab initio data-analytics study of carbon-dioxide activation on semiconductor oxide surfaces
The excessive emissions of carbon dioxide (CO) into the atmosphere
threaten to shift the CO cycle planet-wide and induce unpredictable climate
changes. Using artificial intelligence (AI) trained on high-throughput first
principles based data for a broad family of oxides, we develop a strategy for a
rational design of catalytic materials for converting CO to fuels and other
useful chemicals. We demonstrate that an electron transfer to the
-antibonding orbital of the adsorbed molecule and the associated bending
of the initially linear molecule, previously proposed as the indicator of
activation, are insufficient to account for the good catalytic performance of
experimentally characterized oxide surfaces. Instead, our AI model identifies
the common feature of these surfaces in the binding of a molecular O atom to a
surface cation, which results in a strong elongation and therefore weakening of
one molecular C-O bond. This finding suggests using the C-O bond elongation as
an indicator of CO activation. Based on these findings, we propose a set of
new promising oxide-based catalysts for CO conversion, and a recipe to find
more
Higher-Order DeepTrails: Unified Approach to *Trails
Analyzing, understanding, and describing human behavior is advantageous in
different settings, such as web browsing or traffic navigation. Understanding
human behavior naturally helps to improve and optimize the underlying
infrastructure or user interfaces. Typically, human navigation is represented
by sequences of transitions between states. Previous work suggests to use
hypotheses, representing different intuitions about the navigation to analyze
these transitions. To mathematically grasp this setting, first-order Markov
chains are used to capture the behavior, consequently allowing to apply
different kinds of graph comparisons, but comes with the inherent drawback of
losing information about higher-order dependencies within the sequences. To
this end, we propose to analyze entire sequences using autoregressive language
models, as they are traditionally used to model higher-order dependencies in
sequences. We show that our approach can be easily adapted to model different
settings introduced in previous work, namely HypTrails, MixedTrails and even
SubTrails, while at the same time bringing unique advantages: 1. Modeling
higher-order dependencies between state transitions, while 2. being able to
identify short comings in proposed hypotheses, and 3. naturally introducing a
unified approach to model all settings. To show the expressiveness of our
approach, we evaluate our approach on different synthetic datasets and conclude
with an exemplary analysis of a real-world dataset, examining the behavior of
users who interact with voice assistants
Mining subjectively interesting attributed subgraphs
Community detection in graphs, data clustering, and local pattern mining
are three mature fields of data mining and machine learning.
In recent years, attributed subgraph mining is emerging as a new
powerful data mining task in the intersection of these areas.
Given a graph and a set of attributes for each vertex,
attributed subgraph mining aims to find cohesive subgraphs
for which (a subset of) the attribute values has exceptional values in some sense.
While research on this task can borrow from the three abovementioned fields,
the principled integration of graph and attribute data poses two challenges:
the definition of a pattern language that is intuitive and lends itself to efficient search strategies,
and the formalization of the interestingness of such patterns.
We propose an integrated solution to both of these challenges.
The proposed pattern language improves upon prior work in being both highly flexible and intuitive.
We show how an effective and principled algorithm can enumerate patterns of this language.
The proposed approach for quantifying interestingness of patterns of this language
is rooted in information theory, and is able to account for prior knowledge on the data.
Prior work typically quantifies interestingness based on the cohesion of the subgraph
and for the exceptionality of its attributes separately,
combining these in a parameterized trade-off.
Instead, in our proposal this trade-off is implicitly handled in a principled, parameter-free manner.
Extensive empirical results confirm the proposed pattern syntax is intuitive,
and the interestingness measure aligns well with actual subjective interestingness
DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups
We strive to find contexts (i.e., subgroups of entities) under which exceptional (dis-)agreement occurs among a group of individuals , in any type of data featuring individuals (e.g., parliamentarians , customers) performing observable actions (e.g., votes, ratings) on entities (e.g., legislative procedures, movies). To this end, we introduce the problem of discovering statistically significant exceptional contextual intra-group agreement patterns. To handle the sparsity inherent to voting and rating data, we use Krippendorff's Alpha measure for assessing the agreement among individuals. We devise a branch-and-bound algorithm , named DEvIANT, to discover such patterns. DEvIANT exploits both closure operators and tight optimistic estimates. We derive analytic approximations for the confidence intervals (CIs) associated with patterns for a computationally efficient significance assessment. We prove that these approximate CIs are nested along specialization of patterns. This allows to incorporate pruning properties in DEvIANT to quickly discard non-significant patterns. Empirical study on several datasets demonstrates the efficiency and the usefulness of DEvIANT. Technical Report Associated with the ECML/PKDD 2019 Paper entitled: "DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups"
Identifying outstanding transition-metal-alloy heterogeneous catalysts for the oxygen reduction and evolution reactions via subgroup discovery
In order to estimate the reactivity of a large number of potentially complex
heterogeneous catalysts while searching for novel and more efficient materials,
physical as well as data-centric models have been developed for a faster
evaluation of adsorption energies compared to first-principles calculations.
However, global models designed to describe as many materials as possible might
overlook the very few compounds that have the appropriate adsorption properties
to be suitable for a given catalytic process. Here, the subgroup-discovery
(SGD) local artificial-intelligence approach is used to identify the key
descriptive parameters and constrains on their values, the so-called SG rules,
which particularly describe transition-metal surfaces with outstanding
adsorption properties for the oxygen reduction and evolution reactions. We
start from a data set of 95 oxygen adsorption energy values evaluated by
density-functional-theory calculations for several monometallic surfaces along
with 16 atomic, bulk and surface properties as candidate descriptive
parameters. From this data set, SGD identifies constraints on the most relevant
parameters describing materials and adsorption sites that (i) result in O
adsorption energies within the Sabatier-optimal range required for the oxygen
reduction reaction and (ii) present the largest deviations from the linear
scaling relations between O and OH adsorption energies, which limit the
performance in the oxygen evolution reaction. The SG rules not only reflect the
local underlying physicochemical phenomena that result in the desired
adsorption properties but also guide the challenging design of alloy catalysts
- …