12 research outputs found

    A CP-based approach for mining sequential patterns with quantities

    No full text
    This paper addresses the problem of mining sequential patterns (SPM) from data represented as a set of sequences. In this work, we are interested in sequences of items in which each item is associated with its quantity. To the best of our knowledge, existing approaches don’t allow to handle this kind of sequences under constraints. In the other hand, several proposals show the efficiency of constraint programming (CP) to solve SPM problem dealing with several kind of constraints. However, in this paper, we propose the global constraint QSPM which is an extension of the two CP-based approaches proposed in [5] and [7]. Experiments on real-life datasets show the efficiency of our approach allowing to specify many constraints like size, membership and regular expression constraints

    Interval graph mining

    No full text
    International audienc

    A Constraint Programming Approach for Web Log Mining

    No full text
    International audienc

    Lower and upper queries for graph-mining

    No full text
    International audienceCanonical encoding is one of the key operations required by subgraph mining algorithms for candidates generation. They enable to query the exact number of frequent subgraphs. Existing approaches make use of canonical encodings with an exponential time complexity. As a consequence, mining all frequent patterns for large graphs is com- putationally expensive. In this paper, we propose to relax the canonicity property, leading to two encodings, lower and upper encodings, with a polynomial time complexity, allowing to tightly enclose the exact set of frequent subgraphs. These two encodings allow two kinds of queries, lower and upper queries, to get respectively a subset and a superset of frequent patterns. Lower and upper encodings have been integrated in Gaston. Experiments performed on large and dense synthetic graphs show that, these two encodings are very effective compared to Gaston and gSpan, while on large real world sparse graphs they remain very competitive

    Complete and Incomplete Approaches for Graph Mining

    No full text
    International audienceIn this paper, we revisit approaches for graph mining where a set of simple encodings is proposed. Complete approaches are those using an encoding allowing to get all the frequent subgraphs. Whereas incomplete approaches do not guarantee to nd all the frequent sub- graphs. Our objective is also to highlight the critical points in the process of extracting the frequent subgraphs with complete and incomplete ap- proaches. Current canonical encodings have a complexity which is of exponential nature, motivating this paper to propose a relaxation of canonicity of the encoding leading to complete and incomplete encod- ings with a linear complexity. These techniques are implemented within our graph miner GGM (Generic Graph Miner) and then evaluated on a set of graph databases, showing the behavior of both complete and incomplete approaches

    Mining Relevant Sequence Patterns with CP-based Framework

    No full text
    International audienceSequential pattern mining under various constraints is a challenging data mining task. The paper provides a generic framework based on constraint programming to discover sequence patterns defined by constraints on local patterns (e.g., gap, regular expressions) or constraints on patterns involving combination of local patterns such as relevant subgroups and top-k patterns. This framework enables the user to mine in a declarative way both kinds of patterns. The solving step is done by exploiting the machinery of Constraint Programming. For complex patterns involving combination of local patterns, we improve the mining step by using dynamic CSP. Finally, we present two case studies in biomedical information extraction and stylistic analysis in linguistics
    corecore