255 research outputs found
Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules
Association rules are among the most widely employed data analysis methods in
the field of Data Mining. An association rule is a form of partial implication
between two sets of binary variables. In the most common approach, association
rules are parameterized by a lower bound on their confidence, which is the
empirical conditional probability of their consequent given the antecedent,
and/or by some other parameter bounds such as "support" or deviation from
independence. We study here notions of redundancy among association rules from
a fundamental perspective. We see each transaction in a dataset as an
interpretation (or model) in the propositional logic sense, and consider
existing notions of redundancy, that is, of logical entailment, among
association rules, of the form "any dataset in which this first rule holds must
obey also that second rule, therefore the second is redundant". We discuss
several existing alternative definitions of redundancy between association
rules and provide new characterizations and relationships among them. We show
that the main alternatives we discuss correspond actually to just two variants,
which differ in the treatment of full-confidence implications. For each of
these two notions of redundancy, we provide a sound and complete deduction
calculus, and we show how to construct complete bases (that is,
axiomatizations) of absolutely minimum size in terms of the number of rules. We
explore finally an approach to redundancy with respect to several association
rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape
Revisiting Numerical Pattern Mining with Formal Concept Analysis
In this paper, we investigate the problem of mining numerical data in the
framework of Formal Concept Analysis. The usual way is to use a scaling
procedure --transforming numerical attributes into binary ones-- leading either
to a loss of information or of efficiency, in particular w.r.t. the volume of
extracted patterns. By contrast, we propose to directly work on numerical data
in a more precise and efficient way, and we prove it. For that, the notions of
closed patterns, generators and equivalent classes are revisited in the
numerical context. Moreover, two original algorithms are proposed and used in
an evaluation involving real-world data, showing the predominance of the
present approach
A genetic algorithm coupled with tree-based pruning for mining closed association rules
Due to the voluminous amount of itemsets that are generated, the association rules extracted from these itemsets contain redundancy, and designing an effective approach to address this issue is of paramount importance. Although multiple algorithms were proposed in recent years for mining closed association rules most of them underperform in terms of run time or memory. Another issue that remains challenging is the nature of the dataset. While some of the existing algorithms perform well on dense datasets others perform well on sparse datasets. This paper aims to handle these drawbacks by using a genetic algorithm for mining closed association rules. Recent studies have shown that genetic algorithms perform better than conventional algorithms due to their bitwise operations of crossover and mutation. Bitwise operations are predominantly faster than conventional approaches and bits consume lesser memory thereby improving the overall performance of the algorithm. To address the redundancy in the mined association rules a tree-based pruning algorithm has been designed here. This works on the principle of minimal antecedent and maximal consequent. Experiments have shown that the proposed approach works well on both dense and sparse datasets while surpassing existing techniques with regard to run time and memory
The Coron System
Coron is a domain and platform independent, multi-purposed data mining
toolkit, which incorporates not only a rich collection of data mining
algorithms, but also allows a number of auxiliary operations. To the best of
our knowledge, a data mining toolkit designed specifically for itemset
extraction and association rule generation like Coron does not exist elsewhere.
Coron also provides support for preparing and filtering data, and for
interpreting the extracted units of knowledge
Closed sets based discovery of small covers for association rules (extended version)
International audienceIn this paper, we address the problem of the usefulness of the set of discovered association rules. This problem is important since real-life databases yield most of the time several thousands of rules with high confidence. We propose new algorithms based on Galois closed sets to reduce the extraction to small covers (or bases) for exact and approximate rules, adapted from lattice theory and data analysis domain. Once frequent closed itemsets – which constitute a generating set for both frequent itemsets and association rules – have been discovered, no additional database pass is needed to derive these bases. Experiments conducted on real-life databases show that these algorithms are efficient and valuable in practice
- …