8 research outputs found

    A Constraint Programming Approach for Mining Sequential Patterns in a Sequence Database

    Full text link
    Constraint-based pattern discovery is at the core of numerous data mining tasks. Patterns are extracted with respect to a given set of constraints (frequency, closedness, size, etc). In the context of sequential pattern mining, a large number of devoted techniques have been developed for solving particular classes of constraints. The aim of this paper is to investigate the use of Constraint Programming (CP) to model and mine sequential patterns in a sequence database. Our CP approach offers a natural way to simultaneously combine in a same framework a large set of constraints coming from various origins. Experiments show the feasibility and the interest of our approach

    Constraint-based sequence mining using constraint programming

    Full text link
    The goal of constraint-based sequence mining is to find sequences of symbols that are included in a large number of input sequences and that satisfy some constraints specified by the user. Many constraints have been proposed in the literature, but a general framework is still missing. We investigate the use of constraint programming as general framework for this task. We first identify four categories of constraints that are applicable to sequence mining. We then propose two constraint programming formulations. The first formulation introduces a new global constraint called exists-embedding. This formulation is the most efficient but does not support one type of constraint. To support such constraints, we develop a second formulation that is more general but incurs more overhead. Both formulations can use the projected database technique used in specialised algorithms. Experiments demonstrate the flexibility towards constraint-based settings and compare the approach to existing methods.Comment: In Integration of AI and OR Techniques in Constraint Programming (CPAIOR), 201

    Exploiting incomparability in solution dominance : improving general purpose constraint-based mining

    Get PDF
    In data mining, finding interesting patterns is a challenging task. Constraint-based mining is a well-known approach to this, and one for which constraint programming has been shown to be a well-suited and generic framework. Constraint dominance programming (CDP) has been proposed as an extension that can capture an even wider class of constraint-based mining problems, by allowing us to compare relations between patterns. In this paper we improve CDP with the ability to specify an incomparability condition. This allows us to overcome two major shortcomings of CDP: finding dominated solutions that must then be filtered out after search, and unnecessarily adding dominance blocking constraints between incomparable solutions. We demonstrate the efficacy of our approach by extending the problem specification language ESSENCE and implementing it in a solver-independent manner on top of the constraint modelling tool CONJURE. Our experiments on pattern mining tasks with both a CP solver and a SAT solver show that using the incomparability condition during search significantly improves the efficiency of dominance programming and reduces (and often eliminates entirely) the need for post-processing to filter dominated solutions.Publisher PD

    Hybrid ASP-based Approach to Pattern Mining

    Full text link
    Detecting small sets of relevant patterns from a given dataset is a central challenge in data mining. The relevance of a pattern is based on user-provided criteria; typically, all patterns that satisfy certain criteria are considered relevant. Rule-based languages like Answer Set Programming (ASP) seem well-suited for specifying such criteria in a form of constraints. Although progress has been made, on the one hand, on solving individual mining problems and, on the other hand, developing generic mining systems, the existing methods either focus on scalability or on generality. In this paper we make steps towards combining local (frequency, size, cost) and global (various condensed representations like maximal, closed, skyline) constraints in a generic and efficient way. We present a hybrid approach for itemset, sequence and graph mining which exploits dedicated highly optimized mining systems to detect frequent patterns and then filters the results using declarative ASP. To further demonstrate the generic nature of our hybrid framework we apply it to a problem of approximately tiling a database. Experiments on real-world datasets show the effectiveness of the proposed method and computational gains for itemset, sequence and graph mining, as well as approximate tiling. Under consideration in Theory and Practice of Logic Programming (TPLP).Comment: 29 pages, 7 figures, 5 table

    Mining (Soft-) Skypatterns Using Constraint Programming

    No full text
    International audienceWithin the pattern mining area, skypatterns enable to express a user-preference point of view according to a dominance relation. In this paper, we deal with the introduction of softness in the skypattern mining problem. First, we show how softness can provide convenient patterns that would be missed otherwise. Then, thanks to Constraint Programming, we propose a generic and efficient method to mine skypatterns as well as soft ones. Finally, we show the relevance and the effectiveness of our approach through experiments on UCI benchmarks and a case study in chemoinformatics for discovering toxicophores

    Mining (Soft-)Skypatterns using Constraint Programming

    No full text
    International audienceWithin the pattern mining area, skypatterns enable to express a userpreference point of view according to a dominance relation. In this paper, we deal with the introduction of softness in the skypattern mining problem. First, we show how softness can provide convenient patterns that would be missed otherwise. Then, thanks to Constraint Programming, we propose a generic and efficient method to mine skypatterns as well as soft ones. Finally, we show the relevance and the effectiveness of our approach through a case study in chemoinformatics

    Extraction de motifs sous contraintes souples

    Get PDF
    The objective of this thesis is to introduce softness in pattern mining process in data mining. Using constraint programming, we were able to make four main contributions :(1) A general framework for implementing soft threshold constraints in a pattern mining prototype.(2) The introduction of softness in skypatterns (Pareto-optimal patterns with respect to a set of measures) and the proposal of a generic method for mining (hard) skypatterns as well as soft-skypatterns.(3) The introduction of the skypattern cube and two methods for its construction : one bottom-up, mainly based on derivation rules ; the other uses an approximation of all skypatterns the cube, the method being feasible thanks to the soft-skypatterns.(4) The introduction of the notion of optimal pattern for modeling many pattern extraction problems : skypatterns, top-k, closed patterns, . . . The declarative and genericity side of our approach opens the way for the denition and discovery of new sets of patterns.These contributions have been experimentally validated on real application domains such as the discovery of toxicophores for the rst two contributions and the discovery of mutagenic components for third one.L'objectif de cette thèse est d'introduire de la souplesse dans le processus d'extraction de motifs en fouille de données. En utilisant la programmation par contraintes, nous avons pu apporter quatre principales contributions :(1) La proposition d'un cadre général permettant de mettre en ÷uvre les contraintes souples de seuil dans un extracteur de motifs.(2) L'introduction de la souplesse dans les skypatterns (motifs Pareto-optimaux par rapport à un ensemble de mesures) et la proposition d'une méthode générique permettant aussi bien l'extraction des skypatterns (durs) que des skypatterns souples.(3) L'introduction du cube de skypatterns et la proposition de deux méthodes permettant sa construction : l'une, ascendante, repose principalement sur des règles de dérivation ; l'autre, utilise une approximation de l'ensemble des skypatterns du cube, rendue possible grâce aux skypatterns souples.(4) L'introduction de la notion de motif optimal permettant de modéliser de nombreux problèmes d'extraction de motifs : skypatterns, top-k, motifs fermés, . . . La déclarativité et la généricité de notre approche nous semblent ouvrir la voie à la dénition et à la découverte de nouveaux ensembles demotifs.Ces contributions ont été validées expérimentalement sur des domaines applicatifs réels tels que la découverte de toxicophores pour les deux premières contributions et la découverte de composants mutagènes pour la troisième
    corecore