6 research outputs found
An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming
The main advantage of Constraint Programming (CP) approaches for sequential
pattern mining (SPM) is their modularity, which includes the ability to add new
constraints (regular expressions, length restrictions, etc). The current best
CP approach for SPM uses a global constraint (module) that computes the
projected database and enforces the minimum frequency; it does this with a
filtering algorithm similar to the PrefixSpan method. However, the resulting
system is not as scalable as some of the most advanced mining systems like
Zaki's cSPADE. We show how, using techniques from both data mining and CP, one
can use a generic constraint solver and yet outperform existing specialized
systems. This is mainly due to two improvements in the module that computes the
projected frequencies: first, computing the projected database can be sped up
by pre-computing the positions at which an symbol can become unsupported by a
sequence, thereby avoiding to scan the full sequence each time; and second by
taking inspiration from the trailing used in CP solvers to devise a
backtracking-aware data structure that allows fast incremental storing and
restoring of the projected database. Detailed experiments show how this
approach outperforms existing CP as well as specialized systems for SPM, and
that the gain in efficiency translates directly into increased efficiency for
other settings such as mining with regular expressions.Comment: frequent sequence mining, constraint programmin
Prefix-Projection Global Constraint for Sequential Pattern Mining
Sequential pattern mining under constraints is a challenging data mining
task. Many efficient ad hoc methods have been developed for mining sequential
patterns, but they are all suffering from a lack of genericity. Recent works
have investigated Constraint Programming (CP) methods, but they are not still
effective because of their encoding. In this paper, we propose a global
constraint based on the projected databases principle which remedies to this
drawback. Experiments show that our approach clearly outperforms CP approaches
and competes well with ad hoc methods on large datasets
Constraint-based sequence mining using constraint programming
The goal of constraint-based sequence mining is to find sequences of symbols
that are included in a large number of input sequences and that satisfy some
constraints specified by the user. Many constraints have been proposed in the
literature, but a general framework is still missing. We investigate the use of
constraint programming as general framework for this task. We first identify
four categories of constraints that are applicable to sequence mining. We then
propose two constraint programming formulations. The first formulation
introduces a new global constraint called exists-embedding. This formulation is
the most efficient but does not support one type of constraint. To support such
constraints, we develop a second formulation that is more general but incurs
more overhead. Both formulations can use the projected database technique used
in specialised algorithms. Experiments demonstrate the flexibility towards
constraint-based settings and compare the approach to existing methods.Comment: In Integration of AI and OR Techniques in Constraint Programming
(CPAIOR), 201