434 research outputs found
Revisiting Numerical Pattern Mining with Formal Concept Analysis
In this paper, we investigate the problem of mining numerical data in the
framework of Formal Concept Analysis. The usual way is to use a scaling
procedure --transforming numerical attributes into binary ones-- leading either
to a loss of information or of efficiency, in particular w.r.t. the volume of
extracted patterns. By contrast, we propose to directly work on numerical data
in a more precise and efficient way, and we prove it. For that, the notions of
closed patterns, generators and equivalent classes are revisited in the
numerical context. Moreover, two original algorithms are proposed and used in
an evaluation involving real-world data, showing the predominance of the
present approach
A Model-Based Frequency Constraint for Mining Associations from Transaction Data
Mining frequent itemsets is a popular method for finding associated items in
databases. For this method, support, the co-occurrence frequency of the items
which form an association, is used as the primary indicator of the
associations's significance. A single user-specified support threshold is used
to decided if associations should be further investigated. Support has some
known problems with rare items, favors shorter itemsets and sometimes produces
misleading associations.
In this paper we develop a novel model-based frequency constraint as an
alternative to a single, user-specified minimum support. The constraint
utilizes knowledge of the process generating transaction data by applying a
simple stochastic mixture model (the NB model) which allows for transaction
data's typically highly skewed item frequency distribution. A user-specified
precision threshold is used together with the model to find local frequency
thresholds for groups of itemsets. Based on the constraint we develop the
notion of NB-frequent itemsets and adapt a mining algorithm to find all
NB-frequent itemsets in a database. In experiments with publicly available
transaction databases we show that the new constraint provides improvements
over a single minimum support threshold and that the precision threshold is
more robust and easier to set and interpret by the user
- …