Search CORE

174,721 research outputs found

A Constraint Programming Approach for Mining Sequential Patterns in a Sequence Database

Author: Charnois Thierry
Loudni Samir
Métivier Jean-Philippe
Publication venue
Publication date: 23/09/2013
Field of study

Constraint-based pattern discovery is at the core of numerous data mining tasks. Patterns are extracted with respect to a given set of constraints (frequency, closedness, size, etc). In the context of sequential pattern mining, a large number of devoted techniques have been developed for solving particular classes of constraints. The aim of this paper is to investigate the use of Constraint Programming (CP) to model and mine sequential patterns in a sequence database. Our CP approach offers a natural way to simultaneously combine in a same framework a large set of constraints coming from various origins. Experiments show the feasibility and the interest of our approach

arXiv.org e-Print Archive

HAL - Normandie Université

HAL-Paris 13

Constraint-based Pattern Discovery from DNA Sequences

Author: ROVITTI ANTONIO
Publication venue: 'Pisa University Press'
Publication date: 12/12/2007
Field of study

Obiettivo della tesi è sviluppare un sistema per la scoperta di pattern da sequenze di DNA basato su vincoli definiti dall'utene. Si definisce formalmente il problema e si studiano le proprietà di alcuni vincoli interessanti. Sulla base di queste proprietà si modifica l'algoritmo Teiresias aggiungendo i vincoli. L'utilizzo dei vincoli aiuta a ridurre l'insieme dei pattern estratti, focalizzando l'attenzione su quelli di interesse e riducendo al tempo stesso lo spazio di ricerca e quindi i tempi di esecuzione. Il sistema è testato empiricamente

Electronic Thesis and Dissertation Archive - Università di Pisa

A Constraint-based Querying System for Exploratory Pattern Discovery

Author: C. LUCCHESE
F. BONCHI
F. GIANNOTTI
R. PEREGO
R. TRASARTI
S. ORLANDO
Publication venue
Publication date: 01/01/2009
Field of study

In this article we present CONQUEST, a constraint-based querying system able to support the intrinsically exploratory (i.e., human-guided, interactive and iterative) nature of pattern discovery. Following the inductive database vision, our framework provides users with an expressive constraint-based query language, which allows the discovery process to be effectively driven toward potentially interesting patterns. Such constraints are also exploited to reduce the cost of pattern mining computation. CONQUEST is a comprehensive mining system that can access real-world relational databases from which to extract data. Through the interaction with a friendly graphical user interface (GUI), the user can deﬁne complex mining queries by means of few clicks. After a pre-processing step, mining queries are answered by an efﬁcient and robust pattern mining engine which entails the state-of-the-art of data and search space reduction techniques. Resulting patterns are then presented to the user in a pattern browsing window, and possibly stored back in the underlying database as relations

Archivio Ricerca Ca'Foscari

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Flexible constrained sampling with guarantees for pattern mining

Author: A Giacometti
A Zimmermann
C Bucilă
CP Gomes
F Bonchi
Luc De Raedt
M Berlingerio
M Boley
MA Hasan
Matthijs van Leeuwen
S Ermon
S Nijssen
T Calders
T Guns
T Guns
Vladimir Dzyuba
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Pattern sampling has been proposed as a potential solution to the infamous pattern explosion. Instead of enumerating all patterns that satisfy the constraints, individual patterns are sampled proportional to a given quality measure. Several sampling algorithms have been proposed, but each of them has its limitations when it comes to 1) flexibility in terms of quality measures and constraints that can be used, and/or 2) guarantees with respect to sampling accuracy. We therefore present Flexics, the first flexible pattern sampler that supports a broad class of quality measures and constraints, while providing strong guarantees regarding sampling accuracy. To achieve this, we leverage the perspective on pattern mining as a constraint satisfaction problem and build upon the latest advances in sampling solutions in SAT as well as existing pattern mining algorithms. Furthermore, the proposed algorithm is applicable to a variety of pattern languages, which allows us to introduce and tackle the novel task of sampling sets of patterns. We introduce and empirically evaluate two variants of Flexics: 1) a generic variant that addresses the well-known itemset sampling task and the novel pattern set sampling task as well as a wide range of expressive constraints within these tasks, and 2) a specialized variant that exploits existing frequent itemset techniques to achieve substantial speed-ups. Experiments show that Flexics is both accurate and efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal (ECML/PKDD 2017 journal track

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

ConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery

Author: Trasarti Roberto
Publication venue: 'Pisa University Press'
Publication date: 07/06/2006
Field of study

Il contributo di questa tesi è il disegno e lo sviluppo di un sistema di Knoledge Discovery denominato ConQueSt. Basato sul paradigma del Pattern Discovery guidato dai vincoli, ConQueSt segue la visione dell’Inductive Database: • il mining è visto come forma più complessa di querying, • il sistema quindi è equipaggiato con un data mining query language, e strettamente collegato con un DBMS • i pattern estratti con query di mining diventano cittadini di prima classe e, seguendo il principio di chiusura, vengono materializzati accanto ai dati nel DBMS. ConQueSt è già stato presentato con successo al workshop internazionale della comunità IDB, e alla prestigiosa conferenza IEEE International Conference on Data Mining Engineering (ICDE 2006). A giugno sarà presentato alla conferenaz italiana di basi di dati (SEBD 2006). E’ attualmente in corso la sottomissione ad una prestigiosa rivista

Electronic Thesis and Dissertation Archive - Università di Pisa

Extending the state-of-the-art of constraint-based pattern discovery, In:

Author: Claudio Lucchese
Francesco Bonchi
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2005
Field of study

Abstract The constraint-based pattern discovery paradigm was introduced with the aim of providing to the user a tool to drive the discovery process towards potentially interesting patterns, with the positive side effect of achieving a more efficient computation. In this paper we review and extend the state-of-the-art of the constraints that can be pushed in a frequent pattern computation. We introduce novel data reduction techniques which are able to exploit convertible anti-monotone constraints (e.g., constraints on average or median) as well as tougher constraints (e.g., constraints on variance or standard deviation). A thorough experimental study is performed and it confirms that our framework outperforms previous algorithms for convertible constraints, and exploit the tougher ones with the same effectiveness. Finally, we highlight that the main advantage of our approach, i.e., pushing constraints by means of data reduction in a level-wise framework, is that different properties of different constraints can be exploited all together, and the total benefit is always greater than the sum of the individual benefits. This consideration leads to the definition of a general Apriori-like algorithm which is able to exploit all possible kinds of constraints studied so far

CiteSeerX

Using Answer Set Programming for pattern mining

Author: Guyet Thomas
Moinard Yves
Quiniou René
Publication venue
Publication date: 11/06/2014
Field of study

Serial pattern mining consists in extracting the frequent sequential patterns from a unique sequence of itemsets. This paper explores the ability of a declarative language, such as Answer Set Programming (ASP), to solve this issue efficiently. We propose several ASP implementations of the frequent sequential pattern mining task: a non-incremental and an incremental resolution. The results show that the incremental resolution is more efficient than the non-incremental one, but both ASP programs are less efficient than dedicated algorithms. Nonetheless, this approach can be seen as a first step toward a generic framework for sequential pattern mining with constraints.Comment: Intelligence Artificielle Fondamentale (2014

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1