23,169 research outputs found
A Constraint Programming Approach for Mining Sequential Patterns in a Sequence Database
Constraint-based pattern discovery is at the core of numerous data mining
tasks. Patterns are extracted with respect to a given set of constraints
(frequency, closedness, size, etc). In the context of sequential pattern
mining, a large number of devoted techniques have been developed for solving
particular classes of constraints. The aim of this paper is to investigate the
use of Constraint Programming (CP) to model and mine sequential patterns in a
sequence database. Our CP approach offers a natural way to simultaneously
combine in a same framework a large set of constraints coming from various
origins. Experiments show the feasibility and the interest of our approach
Prefix-Projection Global Constraint for Sequential Pattern Mining
Sequential pattern mining under constraints is a challenging data mining
task. Many efficient ad hoc methods have been developed for mining sequential
patterns, but they are all suffering from a lack of genericity. Recent works
have investigated Constraint Programming (CP) methods, but they are not still
effective because of their encoding. In this paper, we propose a global
constraint based on the projected databases principle which remedies to this
drawback. Experiments show that our approach clearly outperforms CP approaches
and competes well with ad hoc methods on large datasets
Constraint-based sequence mining using constraint programming
The goal of constraint-based sequence mining is to find sequences of symbols
that are included in a large number of input sequences and that satisfy some
constraints specified by the user. Many constraints have been proposed in the
literature, but a general framework is still missing. We investigate the use of
constraint programming as general framework for this task. We first identify
four categories of constraints that are applicable to sequence mining. We then
propose two constraint programming formulations. The first formulation
introduces a new global constraint called exists-embedding. This formulation is
the most efficient but does not support one type of constraint. To support such
constraints, we develop a second formulation that is more general but incurs
more overhead. Both formulations can use the projected database technique used
in specialised algorithms. Experiments demonstrate the flexibility towards
constraint-based settings and compare the approach to existing methods.Comment: In Integration of AI and OR Techniques in Constraint Programming
(CPAIOR), 201
An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming
The main advantage of Constraint Programming (CP) approaches for sequential
pattern mining (SPM) is their modularity, which includes the ability to add new
constraints (regular expressions, length restrictions, etc). The current best
CP approach for SPM uses a global constraint (module) that computes the
projected database and enforces the minimum frequency; it does this with a
filtering algorithm similar to the PrefixSpan method. However, the resulting
system is not as scalable as some of the most advanced mining systems like
Zaki's cSPADE. We show how, using techniques from both data mining and CP, one
can use a generic constraint solver and yet outperform existing specialized
systems. This is mainly due to two improvements in the module that computes the
projected frequencies: first, computing the projected database can be sped up
by pre-computing the positions at which an symbol can become unsupported by a
sequence, thereby avoiding to scan the full sequence each time; and second by
taking inspiration from the trailing used in CP solvers to devise a
backtracking-aware data structure that allows fast incremental storing and
restoring of the projected database. Detailed experiments show how this
approach outperforms existing CP as well as specialized systems for SPM, and
that the gain in efficiency translates directly into increased efficiency for
other settings such as mining with regular expressions.Comment: frequent sequence mining, constraint programmin
Constraint-based Sequential Pattern Mining with Decision Diagrams
Constrained sequential pattern mining aims at identifying frequent patterns
on a sequential database of items while observing constraints defined over the
item attributes. We introduce novel techniques for constraint-based sequential
pattern mining that rely on a multi-valued decision diagram representation of
the database. Specifically, our representation can accommodate multiple item
attributes and various constraint types, including a number of non-monotone
constraints. To evaluate the applicability of our approach, we develop an
MDD-based prefix-projection algorithm and compare its performance against a
typical generate-and-check variant, as well as a state-of-the-art
constraint-based sequential pattern mining algorithm. Results show that our
approach is competitive with or superior to these other methods in terms of
scalability and efficiency.Comment: AAAI201
Using Answer Set Programming for pattern mining
Serial pattern mining consists in extracting the frequent sequential patterns
from a unique sequence of itemsets. This paper explores the ability of a
declarative language, such as Answer Set Programming (ASP), to solve this issue
efficiently. We propose several ASP implementations of the frequent sequential
pattern mining task: a non-incremental and an incremental resolution. The
results show that the incremental resolution is more efficient than the
non-incremental one, but both ASP programs are less efficient than dedicated
algorithms. Nonetheless, this approach can be seen as a first step toward a
generic framework for sequential pattern mining with constraints.Comment: Intelligence Artificielle Fondamentale (2014
DSL: Discriminative Subgraph Learning via Sparse Self-Representation
The goal in network state prediction (NSP) is to classify the global state
(label) associated with features embedded in a graph. This graph structure
encoding feature relationships is the key distinctive aspect of NSP compared to
classical supervised learning. NSP arises in various applications: gene
expression samples embedded in a protein-protein interaction (PPI) network,
temporal snapshots of infrastructure or sensor networks, and fMRI coherence
network samples from multiple subjects to name a few. Instances from these
domains are typically ``wide'' (more features than samples), and thus, feature
sub-selection is required for robust and generalizable prediction. How to best
employ the network structure in order to learn succinct connected subgraphs
encompassing the most discriminative features becomes a central challenge in
NSP. Prior work employs connected subgraph sampling or graph smoothing within
optimization frameworks, resulting in either large variance of quality or weak
control over the connectivity of selected subgraphs.
In this work we propose an optimization framework for discriminative subgraph
learning (DSL) which simultaneously enforces (i) sparsity, (ii) connectivity
and (iii) high discriminative power of the resulting subgraphs of features. Our
optimization algorithm is a single-step solution for the NSP and the associated
feature selection problem. It is rooted in the rich literature on
maximal-margin optimization, spectral graph methods and sparse subspace
self-representation. DSL simultaneously ensures solution interpretability and
superior predictive power (up to 16% improvement in challenging instances
compared to baselines), with execution times up to an hour for large instances.Comment: 9 page
- …