16,494 research outputs found
Subjectively Interesting Subgroup Discovery on Real-valued Targets
Deriving insights from high-dimensional data is one of the core problems in
data mining. The difficulty mainly stems from the fact that there are
exponentially many variable combinations to potentially consider, and there are
infinitely many if we consider weighted combinations, even for linear
combinations. Hence, an obvious question is whether we can automate the search
for interesting patterns and visualizations. In this paper, we consider the
setting where a user wants to learn as efficiently as possible about
real-valued attributes. For example, to understand the distribution of crime
rates in different geographic areas in terms of other (numerical, ordinal
and/or categorical) variables that describe the areas. We introduce a method to
find subgroups in the data that are maximally informative (in the formal
Information Theoretic sense) with respect to a single or set of real-valued
target attributes. The subgroup descriptions are in terms of a succinct set of
arbitrarily-typed other attributes. The approach is based on the Subjective
Interestingness framework FORSIED to enable the use of prior knowledge when
finding most informative non-redundant patterns, and hence the method also
supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio
A tool for subjective and interactive visual data exploration
We present SIDE, a tool for Subjective and Interactive Visual Data Exploration, which lets users explore high dimensional data via subjectively informative 2D data visualizations. Many existing visual analytics tools are either restricted to specific problems and domains or they aim to find visualizations that align with user’s belief about the data. In contrast, our generic tool computes data visualizations that are surprising given a user’s current understanding of the data. The user’s belief state is represented as a set of projection tiles. Hence, this user-awareness offers users an efficient way to interactively explore yet-unknown features of complex high dimensional datasets
Learning what matters - Sampling interesting patterns
In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user's interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.Comment: PAKDD 2017, extended versio
Construction and abstraction: contrasting methods of supporting model building in learning science
No description supplie
Big-Data-Driven Materials Science and its FAIR Data Infrastructure
This chapter addresses the forth paradigm of materials research -- big-data
driven materials science. Its concepts and state-of-the-art are described, and
its challenges and chances are discussed. For furthering the field, Open Data
and an all-embracing sharing, an efficient data infrastructure, and the rich
ecosystem of computer codes used in the community are of critical importance.
For shaping this forth paradigm and contributing to the development or
discovery of improved and novel materials, data must be what is now called FAIR
-- Findable, Accessible, Interoperable and Re-purposable/Re-usable. This sets
the stage for advances of methods from artificial intelligence that operate on
large data sets to find trends and patterns that cannot be obtained from
individual calculations and not even directly from high-throughput studies.
Recent progress is reviewed and demonstrated, and the chapter is concluded by a
forward-looking perspective, addressing important not yet solved challenges.Comment: submitted to the Handbook of Materials Modeling (eds. S. Yip and W.
Andreoni), Springer 2018/201
- …