12 research outputs found
A CP-based approach for mining sequential patterns with quantities
This paper addresses the problem of mining sequential patterns (SPM) from data represented as a set of
sequences. In this work, we are interested in sequences of items in which each item is associated with its quantity.
To the best of our knowledge, existing approaches don’t allow to handle this kind of sequences under constraints.
In the other hand, several proposals show the efficiency of constraint programming (CP) to solve SPM problem
dealing with several kind of constraints. However, in this paper, we propose the global constraint QSPM which
is an extension of the two CP-based approaches proposed in [5] and [7]. Experiments on real-life datasets show
the efficiency of our approach allowing to specify many constraints like size, membership and regular expression
constraints
A Constraint Programming Approach for Web Log Mining
International audienc
Lower and upper queries for graph-mining
International audienceCanonical encoding is one of the key operations required by subgraph mining algorithms for candidates generation. They enable to query the exact number of frequent subgraphs. Existing approaches make use of canonical encodings with an exponential time complexity. As a consequence, mining all frequent patterns for large graphs is com- putationally expensive. In this paper, we propose to relax the canonicity property, leading to two encodings, lower and upper encodings, with a polynomial time complexity, allowing to tightly enclose the exact set of frequent subgraphs. These two encodings allow two kinds of queries, lower and upper queries, to get respectively a subset and a superset of frequent patterns. Lower and upper encodings have been integrated in Gaston. Experiments performed on large and dense synthetic graphs show that, these two encodings are very effective compared to Gaston and gSpan, while on large real world sparse graphs they remain very competitive
Complete and Incomplete Approaches for Graph Mining
International audienceIn this paper, we revisit approaches for graph mining where a set of simple encodings is proposed. Complete approaches are those using an encoding allowing to get all the frequent subgraphs. Whereas incomplete approaches do not guarantee to nd all the frequent sub- graphs. Our objective is also to highlight the critical points in the process of extracting the frequent subgraphs with complete and incomplete ap- proaches. Current canonical encodings have a complexity which is of exponential nature, motivating this paper to propose a relaxation of canonicity of the encoding leading to complete and incomplete encod- ings with a linear complexity. These techniques are implemented within our graph miner GGM (Generic Graph Miner) and then evaluated on a set of graph databases, showing the behavior of both complete and incomplete approaches
Une contrainte globale pour l'extraction de motifs séquentiels.
National audienc
Prefix-projection global constraint and top-k approach for sequential pattern mining
International audienc
Mining Relevant Sequence Patterns with CP-based Framework
International audienceSequential pattern mining under various constraints is a challenging data mining task. The paper provides a generic framework based on constraint programming to discover sequence patterns defined by constraints on local patterns (e.g., gap, regular expressions) or constraints on patterns involving combination of local patterns such as relevant subgroups and top-k patterns. This framework enables the user to mine in a declarative way both kinds of patterns. The solving step is done by exploiting the machinery of Constraint Programming. For complex patterns involving combination of local patterns, we improve the mining step by using dynamic CSP. Finally, we present two case studies in biomedical information extraction and stylistic analysis in linguistics