174,721 research outputs found
A Constraint Programming Approach for Mining Sequential Patterns in a Sequence Database
Constraint-based pattern discovery is at the core of numerous data mining
tasks. Patterns are extracted with respect to a given set of constraints
(frequency, closedness, size, etc). In the context of sequential pattern
mining, a large number of devoted techniques have been developed for solving
particular classes of constraints. The aim of this paper is to investigate the
use of Constraint Programming (CP) to model and mine sequential patterns in a
sequence database. Our CP approach offers a natural way to simultaneously
combine in a same framework a large set of constraints coming from various
origins. Experiments show the feasibility and the interest of our approach
Constraint-based Pattern Discovery from DNA Sequences
Obiettivo della tesi è sviluppare un sistema per la scoperta di pattern da sequenze di DNA basato su vincoli definiti dall'utene. Si definisce formalmente il problema e si studiano le proprietà di alcuni vincoli interessanti. Sulla base di queste proprietà si modifica l'algoritmo Teiresias aggiungendo i vincoli. L'utilizzo dei vincoli aiuta a ridurre l'insieme dei pattern estratti, focalizzando l'attenzione su quelli di interesse e riducendo al tempo stesso lo spazio di ricerca e quindi i tempi di esecuzione. Il sistema è testato empiricamente
A Constraint-based Querying System for Exploratory Pattern Discovery
In this article we present CONQUEST, a constraint-based querying system able to support
the intrinsically exploratory (i.e., human-guided, interactive and iterative) nature of
pattern discovery. Following the inductive database vision, our framework provides users
with an expressive constraint-based query language, which allows the discovery process
to be effectively driven toward potentially interesting patterns. Such constraints are also
exploited to reduce the cost of pattern mining computation. CONQUEST is a comprehensive
mining system that can access real-world relational databases from which to extract data.
Through the interaction with a friendly graphical user interface (GUI), the user can define
complex mining queries by means of few clicks. After a pre-processing step, mining
queries are answered by an efficient and robust pattern mining engine which entails
the state-of-the-art of data and search space reduction techniques. Resulting patterns are
then presented to the user in a pattern browsing window, and possibly stored back in the
underlying database as relations
Flexible constrained sampling with guarantees for pattern mining
Pattern sampling has been proposed as a potential solution to the infamous
pattern explosion. Instead of enumerating all patterns that satisfy the
constraints, individual patterns are sampled proportional to a given quality
measure. Several sampling algorithms have been proposed, but each of them has
its limitations when it comes to 1) flexibility in terms of quality measures
and constraints that can be used, and/or 2) guarantees with respect to sampling
accuracy. We therefore present Flexics, the first flexible pattern sampler that
supports a broad class of quality measures and constraints, while providing
strong guarantees regarding sampling accuracy. To achieve this, we leverage the
perspective on pattern mining as a constraint satisfaction problem and build
upon the latest advances in sampling solutions in SAT as well as existing
pattern mining algorithms. Furthermore, the proposed algorithm is applicable to
a variety of pattern languages, which allows us to introduce and tackle the
novel task of sampling sets of patterns. We introduce and empirically evaluate
two variants of Flexics: 1) a generic variant that addresses the well-known
itemset sampling task and the novel pattern set sampling task as well as a wide
range of expressive constraints within these tasks, and 2) a specialized
variant that exploits existing frequent itemset techniques to achieve
substantial speed-ups. Experiments show that Flexics is both accurate and
efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal
(ECML/PKDD 2017 journal track
ConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery
Il contributo di questa tesi è il disegno e lo sviluppo di un sistema di Knoledge Discovery denominato ConQueSt.
Basato sul paradigma del Pattern Discovery guidato dai vincoli, ConQueSt segue la visione dell’Inductive Database:
• il mining è visto come forma più complessa di querying,
• il sistema quindi è equipaggiato con un data mining query language, e strettamente collegato con un DBMS
• i pattern estratti con query di mining diventano cittadini di prima classe e, seguendo il principio di chiusura, vengono materializzati accanto ai dati nel DBMS.
ConQueSt è già stato presentato con successo al workshop internazionale della comunità IDB, e alla prestigiosa conferenza IEEE International Conference on Data Mining Engineering (ICDE 2006). A giugno sarà presentato alla conferenaz italiana di basi di dati (SEBD 2006). E’ attualmente in corso la sottomissione ad una prestigiosa rivista
Extending the state-of-the-art of constraint-based pattern discovery, In:
Abstract The constraint-based pattern discovery paradigm was introduced with the aim of providing to the user a tool to drive the discovery process towards potentially interesting patterns, with the positive side effect of achieving a more efficient computation. In this paper we review and extend the state-of-the-art of the constraints that can be pushed in a frequent pattern computation. We introduce novel data reduction techniques which are able to exploit convertible anti-monotone constraints (e.g., constraints on average or median) as well as tougher constraints (e.g., constraints on variance or standard deviation). A thorough experimental study is performed and it confirms that our framework outperforms previous algorithms for convertible constraints, and exploit the tougher ones with the same effectiveness. Finally, we highlight that the main advantage of our approach, i.e., pushing constraints by means of data reduction in a level-wise framework, is that different properties of different constraints can be exploited all together, and the total benefit is always greater than the sum of the individual benefits. This consideration leads to the definition of a general Apriori-like algorithm which is able to exploit all possible kinds of constraints studied so far
Using Answer Set Programming for pattern mining
Serial pattern mining consists in extracting the frequent sequential patterns
from a unique sequence of itemsets. This paper explores the ability of a
declarative language, such as Answer Set Programming (ASP), to solve this issue
efficiently. We propose several ASP implementations of the frequent sequential
pattern mining task: a non-incremental and an incremental resolution. The
results show that the incremental resolution is more efficient than the
non-incremental one, but both ASP programs are less efficient than dedicated
algorithms. Nonetheless, this approach can be seen as a first step toward a
generic framework for sequential pattern mining with constraints.Comment: Intelligence Artificielle Fondamentale (2014
- …