490 research outputs found
Size of random Galois lattices and number of frequent itemsets
19 pagesWe compute the mean and the variance of the size of the Galois lattice built from a random matrix with i.i.d. Bernoulli(p) entries. Then, obseving that closed frequent itemsets are in bijection with winning coalitions, we compute the mean and the variance of the number of closed frequent itemsets. This can be of interest for mining association rules
A Model-Based Frequency Constraint for Mining Associations from Transaction Data
Mining frequent itemsets is a popular method for finding associated items in
databases. For this method, support, the co-occurrence frequency of the items
which form an association, is used as the primary indicator of the
associations's significance. A single user-specified support threshold is used
to decided if associations should be further investigated. Support has some
known problems with rare items, favors shorter itemsets and sometimes produces
misleading associations.
In this paper we develop a novel model-based frequency constraint as an
alternative to a single, user-specified minimum support. The constraint
utilizes knowledge of the process generating transaction data by applying a
simple stochastic mixture model (the NB model) which allows for transaction
data's typically highly skewed item frequency distribution. A user-specified
precision threshold is used together with the model to find local frequency
thresholds for groups of itemsets. Based on the constraint we develop the
notion of NB-frequent itemsets and adapt a mining algorithm to find all
NB-frequent itemsets in a database. In experiments with publicly available
transaction databases we show that the new constraint provides improvements
over a single minimum support threshold and that the precision threshold is
more robust and easier to set and interpret by the user
Revisiting Numerical Pattern Mining with Formal Concept Analysis
In this paper, we investigate the problem of mining numerical data in the
framework of Formal Concept Analysis. The usual way is to use a scaling
procedure --transforming numerical attributes into binary ones-- leading either
to a loss of information or of efficiency, in particular w.r.t. the volume of
extracted patterns. By contrast, we propose to directly work on numerical data
in a more precise and efficient way, and we prove it. For that, the notions of
closed patterns, generators and equivalent classes are revisited in the
numerical context. Moreover, two original algorithms are proposed and used in
an evaluation involving real-world data, showing the predominance of the
present approach
Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules
Association rules are among the most widely employed data analysis methods in
the field of Data Mining. An association rule is a form of partial implication
between two sets of binary variables. In the most common approach, association
rules are parameterized by a lower bound on their confidence, which is the
empirical conditional probability of their consequent given the antecedent,
and/or by some other parameter bounds such as "support" or deviation from
independence. We study here notions of redundancy among association rules from
a fundamental perspective. We see each transaction in a dataset as an
interpretation (or model) in the propositional logic sense, and consider
existing notions of redundancy, that is, of logical entailment, among
association rules, of the form "any dataset in which this first rule holds must
obey also that second rule, therefore the second is redundant". We discuss
several existing alternative definitions of redundancy between association
rules and provide new characterizations and relationships among them. We show
that the main alternatives we discuss correspond actually to just two variants,
which differ in the treatment of full-confidence implications. For each of
these two notions of redundancy, we provide a sound and complete deduction
calculus, and we show how to construct complete bases (that is,
axiomatizations) of absolutely minimum size in terms of the number of rules. We
explore finally an approach to redundancy with respect to several association
rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape
- …