13 research outputs found
Measuring Interestingness – Perspectives on Anomaly Detection
We live in a data deluge. Our ability to gather, distribute, and store information has grown immensely over the past two decades. With this overabundance of data, the core knowledge discovery problem is no longer in the gathering of this data, but rather in the retrieving of relevant data efficiently. While the most common approach is to use rule interestingness to filter results of the association rule generation process, study of literature suggests that interestingness is difficult to define quantitatively and is best summarized as, “a record or pattern is interesting if it suggests a change in an established model.” In this paper we elaborate on the term interestingness, and the surrounding taxonomy of interestingness measures, anomalies, novelty and surprisingness. We review and summarize the current state of literature surrounding interestingness and associated approaches. Keywords: Interestingness, anomaly detection, rare-class mining, Interestingness measures, outliers, surprisingness, novelt
Modeling interestingness of streaming classification rules as a classification problem
Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive classification rules' interestingness learning algorithm (ICRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFFP (Voting Fuzzified Feature Projections), a feature projection based incremental classification algorithm, is also developed in the framework of ICRIL. The concept description learned by the VFFP is the interestingness concept of streaming classification rules. © Springer-Verlag Berlin Heidelberg 2006
Learning Interestingness of Streaming Classification Rules
Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive rule interestingness-learning algorithm (IRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFP (Voting Feature Projections), a feature projection based incremental classification learning algorithm, is also developed in the framework of IRIL. The concept description learned by the VFP algorithm constitutes a novel approach for interestingness analysis of streaming classification rules. © Springer-Verlag 2004
A novel hybrid approach for interestingness analysis of classification rules
Data mining is the efficient discovery of patterns in large databases, and classification rules are perhaps the most important type of patterns in data mining applications. However, the number of such classification rules is generally very big that selection of interesting ones among all discovered rules becomes an important task. In this paper, factors related to the interestingness of a rule are investigated and some new factors are proposed. Following this, an interactive rule interestingness-learning algorithm (IRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user participation. In our study, VFP (Voting Feature Projections), a feature projection based incremental classification learning algorithm, is also developed in the framework of IRIL. The concept description learned by the VFP algorithm constitutes a novel hybrid approach for interestingness analysis of classification rules. © Springer-Verlag Berlin Heidelberg 2007
Interestingness of Discovered Association Rules in terms of Neighborhood-Based Unexpectedness
One of the central problems in knowledge discovery is the development of good measures of interestingness of discovered patterns. With such measures, a user needs to manually examine only the more interesting rules, instead of each of a large number of mined rules. Previous proposals of such measures include rule templates, minimal rule cover, actionability, and unexpectedness in the statistical sense or against user beliefs. In this paper we will introduce neighborhood-based interestingness by considering unexpectedness in terms of neighborhood-based parameters. We first present some novel notions of distance between rules and of neighborhoods of rules. The neighborhood-based interestingness of a rule is then defined in terms of the pattern of the fluctuation of confidences or the density of mined rules in some of its neighborhoods. Such interestingness can also be defined for sets of rules (e.g. plateaus and ridges) when their neighborhoods have certain properties. We can rank the in..
Rough Set Based Rule Evaluations and Their Applications
Knowledge discovery is an important process in data analysis, data
mining and machine learning. Typically knowledge is presented in the
form of rules. However, knowledge discovery systems often generate a
huge amount of rules. One of the challenges we face is how to
automatically discover interesting and meaningful knowledge from
such discovered rules. It is infeasible for human beings to select
important and interesting rules manually. How to provide a measure
to evaluate the qualities of rules in order to facilitate the
understanding of data mining results becomes our focus. In this
thesis, we present a series of rule evaluation techniques for the
purpose of facilitating the knowledge understanding process. These
evaluation techniques help not only to reduce the number of rules,
but also to extract higher quality rules. Empirical studies on both
artificial data sets and real world data sets demonstrate how such
techniques can contribute to practical systems such as ones for
medical diagnosis and web personalization.
In the first part of this thesis, we discuss several rule evaluation
techniques that are proposed towards rule postprocessing. We show
how properly defined rule templates can be used as a rule evaluation
approach. We propose two rough set based measures, a Rule Importance
Measure, and a Rules-As-Attributes Measure,
%a measure of considering rules as attributes,
to rank the important and interesting rules. In the second part of
this thesis, we show how data preprocessing can help with rule
evaluation. Because well preprocessed data is essential for
important rule generation, we propose a new approach for processing
missing attribute values for enhancing the generated rules. In the
third part of this thesis, a rough set based rule evaluation system
is demonstrated to show the effectiveness of the measures proposed
in this thesis. Furthermore, a new user-centric web personalization
system is used as a case study to demonstrate how the proposed
evaluation measures can be used in an actual application
Wissensextraktion zum datengetriebenen Qualitätsmanagement
In der vorliegenden Dissertation werden zwei neue Data Mining Verfahren und Metho-den zur Extraktion von qualitätsrelevantem Wissen vorgestellt und mit Anwendungen aus der Automobilbranche evaluiert. Diese neuartigen Verfahren erlauben es, in Abhän-gigkeit der zugrunde liegenden Datentypen, signifikante Zusammenhänge zwischen qualitätsbeschreibenden und -messenden Daten, die entlang des Wertschöpfungspro-zesses eines Unternehmens für ein Produkt erfasst werden, zu identifizieren