804 research outputs found
Subjectively Interesting Subgroup Discovery on Real-valued Targets
Deriving insights from high-dimensional data is one of the core problems in
data mining. The difficulty mainly stems from the fact that there are
exponentially many variable combinations to potentially consider, and there are
infinitely many if we consider weighted combinations, even for linear
combinations. Hence, an obvious question is whether we can automate the search
for interesting patterns and visualizations. In this paper, we consider the
setting where a user wants to learn as efficiently as possible about
real-valued attributes. For example, to understand the distribution of crime
rates in different geographic areas in terms of other (numerical, ordinal
and/or categorical) variables that describe the areas. We introduce a method to
find subgroups in the data that are maximally informative (in the formal
Information Theoretic sense) with respect to a single or set of real-valued
target attributes. The subgroup descriptions are in terms of a succinct set of
arbitrarily-typed other attributes. The approach is based on the Subjective
Interestingness framework FORSIED to enable the use of prior knowledge when
finding most informative non-redundant patterns, and hence the method also
supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio
A New Algorithm for Exploratory Projection Pursuit
In this paper, we propose a new algorithm for exploratory projection pursuit.
The basis of the algorithm is the insight that previous approaches used fairly
narrow definitions of interestingness / non interestingness. We argue that
allowing these definitions to depend on the problem / data at hand is a more
natural approach in an exploratory technique. This also allows our technique
much greater applicability than the approaches extant in the literature.
Complementing this insight, we propose a class of projection indices based on
the spatial distribution function that can make use of such information.
Finally, with the help of real datasets, we demonstrate how a range of
multivariate exploratory tasks can be addressed with our algorithm. The
examples further demonstrate that the proposed indices are quite capable of
focussing on the interesting structure in the data, even when this structure is
otherwise hard to detect or arises from very subtle patterns.Comment: 29 pages, 8 figure
Analysis of monotonicity properties of some rule interestingness measures
One of the crucial problems in the field of knowledge discovery is development of good interestingness measures for evaluation of the discovered patterns. In this paper, we consider quantitative, objective interestingness measures for "if..., then... " association rules. We focus on three popular interestingness measures, namely rule interest function of Piatetsky-Shapiro, gain measure of Fukuda et al., and dependency factor used by Pawlak. We verify whether they satisfy the valuable property M of monotonic dependency on the number of objects satisfying or not the premise or the conclusion of a rule, and property of hypothesis symmetry (HS). Moreover, analytically and through experiments we show an interesting relationship between those measures and two other commonly used measures of rule support and anti-support
- …