1,247 research outputs found
On Objective Measures of Rule Surprisingness
Most of the literature argues that surprisingness is an inherently subjective aspect of the discovered knowledge, which cannot be measured in objective terms. This paper departs from this view, and it has a twofold goal: (1) showing that it is indeed possible to define objective (rather than subjective) measures of discovered rule surprisingness; (2) proposing new ideas and methods for defining objective rule surprisingness measures
Categorization of interestingness measures for knowledge extraction
Finding interesting association rules is an important and active research
field in data mining. The algorithms of the Apriori family are based on two
rule extraction measures, support and confidence. Although these two measures
have the virtue of being algorithmically fast, they generate a prohibitive
number of rules most of which are redundant and irrelevant. It is therefore
necessary to use further measures which filter uninteresting rules. Many
synthesis studies were then realized on the interestingness measures according
to several points of view. Different reported studies have been carried out to
identify "good" properties of rule extraction measures and these properties
have been assessed on 61 measures. The purpose of this paper is twofold. First
to extend the number of the measures and properties to be studied, in addition
to the formalization of the properties proposed in the literature. Second, in
the light of this formal study, to categorize the studied measures. This paper
leads then to identify categories of measures in order to help the users to
efficiently select an appropriate measure by choosing one or more measure(s)
during the knowledge extraction process. The properties evaluation on the 61
measures has enabled us to identify 7 classes of measures, classes that we
obtained using two different clustering techniques.Comment: 34 pages, 4 figure
Learning what matters - Sampling interesting patterns
In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user's interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.Comment: PAKDD 2017, extended versio
Mining Characteristic Patterns for Comparative Music Corpus Analysis
A core issue of computational pattern mining is the identification of interesting patterns. When mining music corpora organized into classes of songs, patterns may be of interest because they are characteristic, describing prevalent properties of classes, or because they are discriminant, capturing distinctive properties of classes. Existing work in computational music corpus analysis has focused on discovering discriminant patterns. This paper studies characteristic patterns, investigating the behavior of different pattern interestingness measures in balancing coverage and discriminability of classes in top k pattern mining and in individual top ranked patterns. Characteristic pattern mining is applied to the collection of Native American music by Frances Densmore, and the discovered patterns are shown to be supported by Densmore’s own analyses
Towards Role Based Hypothesis Evaluation for Health Data Mining
Data mining researchers have long been concerned with the application of tools to facilitate and
improve data analysis on large, complex data sets. The current challenge is to make data mining
and knowledge discovery systems applicable to a wider range of domains, among them health.
Early work was performed over transactional, retail based data sets, but the attraction of finding
previously unknown knowledge from the ever increasing amounts of data collected from the health
domain is an emerging area of interest and specialisation. The problem is finding a solution that is
suitably flexible to allow for generalised application whilst being specific enough to provide functionality
that caters for the nuances of each role within the domain. The need for a more granular
approach to problem solving in other areas of information technology has resulted in the use of role
based solutions. This paper discusses the progress to date in developing a role oriented solution to
the problem of providing for the diverse requirements of health domain data miners and defining the
foundation for determining what constitutes an interesting discovery in an area as complex as
health
Measuring Interestingness – Perspectives on Anomaly Detection
We live in a data deluge. Our ability to gather, distribute, and store information has grown immensely over the past two decades. With this overabundance of data, the core knowledge discovery problem is no longer in the gathering of this data, but rather in the retrieving of relevant data efficiently. While the most common approach is to use rule interestingness to filter results of the association rule generation process, study of literature suggests that interestingness is difficult to define quantitatively and is best summarized as, “a record or pattern is interesting if it suggests a change in an established model.” In this paper we elaborate on the term interestingness, and the surrounding taxonomy of interestingness measures, anomalies, novelty and surprisingness. We review and summarize the current state of literature surrounding interestingness and associated approaches. Keywords: Interestingness, anomaly detection, rare-class mining, Interestingness measures, outliers, surprisingness, novelt
- …