55 research outputs found
Measuring Interestingness – Perspectives on Anomaly Detection
We live in a data deluge. Our ability to gather, distribute, and store information has grown immensely over the past two decades. With this overabundance of data, the core knowledge discovery problem is no longer in the gathering of this data, but rather in the retrieving of relevant data efficiently. While the most common approach is to use rule interestingness to filter results of the association rule generation process, study of literature suggests that interestingness is difficult to define quantitatively and is best summarized as, “a record or pattern is interesting if it suggests a change in an established model.” In this paper we elaborate on the term interestingness, and the surrounding taxonomy of interestingness measures, anomalies, novelty and surprisingness. We review and summarize the current state of literature surrounding interestingness and associated approaches. Keywords: Interestingness, anomaly detection, rare-class mining, Interestingness measures, outliers, surprisingness, novelt
Modeling interestingness of streaming association rules as a benefit maximizing classification problem
Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2009.Thesis (Ph.D.) -- Bilkent University, 2009.Includes bibliographical references leaves 87-94.In a typical application of association rule learning from market basket data,
a set of transactions for a fixed period of time is used as input to rule learning
algorithms. For example, the well-known Apriori algorithm can be applied to
learn a set of association rules from such a transaction set. However, learning
association rules from a set of transactions is not a one-time only process. For
example, a market manager may perform the association rule learning process
once every month over the set of transactions collected through the previous
month. For this reason, we will consider the problem where transaction sets
are input to the system as a stream of packages. The sets of transactions may
come in varying sizes and in varying periods. Once a set of transactions arrives,
the association rule learning algorithm is run on the last set of transactions,
resulting in a new set of association rules. Therefore, the set of association
rules learned will accumulate and increase in number over time, making the
mining of interesting ones out of this enlarging set of association rules impractical
for human experts. We refer to this sequence of rules as “association rule
set stream” or “streaming association rules” and the main motivation behind
this research is to develop a technique to overcome the interesting rule selection
problem. A successful association rule mining system should select and
present only the interesting rules to the domain experts. However, definition
of interestingness of association rules on a given domain usually differs from
one expert to the other and also over time for a given expert. In this thesis, we
propose a post-processing method to learn a subjective model for the interestingness
concept description of the streaming association rules. The uniqueness
of the proposed method is its ability to formulate the interestingness issue of
association rules as a benefit-maximizing classification problem and obtain a
different interestingness model for each user. In this new classification scheme,
the determining features are the selective objective interestingness factors, including
the rule’s content itself, related to the interestingness of the association
rules; and the target feature is the interestingness label of those rules. The proposed
method works incrementally and employs user interactivity at a certain
level. It is evaluated on a real supermarket dataset. The results show that the
model can successfully select the interesting ones.Aydın, TolgaPh.D
Learning Interestingness of Streaming Classification Rules
Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive rule interestingness-learning algorithm (IRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFP (Voting Feature Projections), a feature projection based incremental classification learning algorithm, is also developed in the framework of IRIL. The concept description learned by the VFP algorithm constitutes a novel approach for interestingness analysis of streaming classification rules. © Springer-Verlag 2004
A novel hybrid approach for interestingness analysis of classification rules
Data mining is the efficient discovery of patterns in large databases, and classification rules are perhaps the most important type of patterns in data mining applications. However, the number of such classification rules is generally very big that selection of interesting ones among all discovered rules becomes an important task. In this paper, factors related to the interestingness of a rule are investigated and some new factors are proposed. Following this, an interactive rule interestingness-learning algorithm (IRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user participation. In our study, VFP (Voting Feature Projections), a feature projection based incremental classification learning algorithm, is also developed in the framework of IRIL. The concept description learned by the VFP algorithm constitutes a novel hybrid approach for interestingness analysis of classification rules. © Springer-Verlag Berlin Heidelberg 2007
Modeling interestingness of streaming classification rules as a classification problem
Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive classification rules' interestingness learning algorithm (ICRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFFP (Voting Fuzzified Feature Projections), a feature projection based incremental classification algorithm, is also developed in the framework of ICRIL. The concept description learned by the VFFP is the interestingness concept of streaming classification rules. © Springer-Verlag Berlin Heidelberg 2006
Interactive visual exploration of association rules with rule-focusing methodology
International audienceOn account of the enormous amounts of rules that can be produced by data mining algorithms, knowledge post-processing is a difficult stage in an association rule discovery process. In order to find relevant knowledge for decision making, the user (a decision maker specialized in the data studied) needs to rummage through the rules. To assist him/her in this task, we here propose the rule-focusing methodology, an interactive methodology for the visual post-processing of association rules. It allows the user to explore large sets of rules freely by focusing his/her attention on limited subsets. This new approach relies on rule interestingness measures, on a visual representation, and on interactive navigation among the rules. We have implemented the rule-focusing methodology in a prototype system called ARVis. It exploits the user's focus to guide the generation of the rules by means of a specific constraint-based rule-mining algorithm
Semantically-guided evolutionary knowledge discovery from texts
This thesis proposes a new approach for structured knowledge discovery from texts
which considers both the mining process itself, the evaluation of this knowledge by the
model, and the human assessment of the quality of the outcome.This is achieved by integrating Natural-Language technology and Genetic Algorithms to produce explanatory novel hypotheses. Natural-Language techniques are
specifically used to extract genre-based information from text documents. Additional
semantic and rhetorical information for generating training data and for feeding a semistructured Latent Semantic Analysis process is also captured.The discovery process is modeled by a semantically-guided Genetic Algorithm
which uses training data to guide the search and optimization process. A number of
novel criteria to evaluate the quality of the new knowledge are proposed. Consequently,
new genetic operations suitable for text mining are designed, and techniques for Evolutionary Multi-Objective Optimization are adapted for the model to trade off between
different criteria in the hypotheses.Domain experts were used in an experiment to assess the quality of the hypotheses
produced by the model so as to establish their effectiveness in terms of novel and
interesting knowledge. The assessment showed encouraging results for the discovered
knowledge and for the correlation between the model and the human opinions
- …