764 research outputs found
A Constraint Programming Approach for Mining Sequential Patterns in a Sequence Database
Constraint-based pattern discovery is at the core of numerous data mining
tasks. Patterns are extracted with respect to a given set of constraints
(frequency, closedness, size, etc). In the context of sequential pattern
mining, a large number of devoted techniques have been developed for solving
particular classes of constraints. The aim of this paper is to investigate the
use of Constraint Programming (CP) to model and mine sequential patterns in a
sequence database. Our CP approach offers a natural way to simultaneously
combine in a same framework a large set of constraints coming from various
origins. Experiments show the feasibility and the interest of our approach
Closing the gap: Sequence mining at scale
Frequent sequence mining is one of the fundamental building blocks in data mining. While the problem has been extensively studied, few of the available techniques are sufficiently scalable to handle datasets with billions of sequences; such large-scale datasets arise, for instance, in text mining and session analysis. In this article, we propose MG-FSM, a scalable algorithm for frequent sequence mining on MapReduce. MG-FSM can handle so-called “gap constraints”, which can be used to limit the output to a controlled set of frequent sequences. Both positional and temporal gap constraints, as well as appropriate maximality and closedness constraints, are supported. At its heart, MG-FSM partitions the input database in a way that allows us to mine each partition independently using any existing frequent sequence mining algorithm. We introduce the notion of ω-equivalency, which is a generalization of the notion of a “projected database” used by many frequent pattern mining algorithms. We also present a number of optimization techniques that minimize partition size, and therefore computational and communication costs, while still maintaining correctness. Our experimental study in the contexts of text mining and session analysis suggests that MG-FSM is significantly more efficient and scalable than alternative approaches
Sequential Pattern Mining with Multidimensional Interval Items
In real sequence pattern mining scenarios, the interval information between two item sets is very important. However, although existing algorithms can effectively mine frequent subsequence sets, the interval information is ignored. This paper aims to mine sequential patterns with multidimensional interval items in sequence databases. In order to address this problem, this paper defines and specifies the interval event problem in the sequential pattern mining task. Then, the interval event items framework is proposed to handle the multidimensional interval event items. Moreover, the MII-Prefixspan algorithm is introduced for the sequential pattern with multidimensional interval event items mining tasks. This algorithm adds the processing of interval event items in the mining process. We can get richer and more in line with actual needs information from mined sequence patterns through these methods. This scheme is applied to the actual website behaviour analysis task to obtain more valuable information for web optimization and provide more valuable sequence pattern information for practical problems. This work also opens a new pathway toward more efficient sequential pattern mining tasks
Longest Common Subsequence with Gap Constraints
We consider the longest common subsequence problem in the context of
subsequences with gap constraints. In particular, following Day et al. 2022, we
consider the setting when the distance (i. e., the gap) between two consecutive
symbols of the subsequence has to be between a lower and an upper bound (which
may depend on the position of those symbols in the subsequence or on the
symbols bordering the gap) as well as the case where the entire subsequence is
found in a bounded range (defined by a single upper bound), considered by
Kosche et al. 2022. In all these cases, we present effcient algorithms for
determining the length of the longest common constrained subsequence between
two given strings
Frequent Pattern mining with closeness Considerations: Current State of the art
Due to rising importance in frequent pattern mining in the field of data mining research, tremendous progress has been observed in fields ranging from frequent itemset mining in transaction databases to numerous research frontiers. An elaborative note on current condition in frequent pattern mining and potential research directions is discussed in this article. It2019;s a strong belief that with considerably increasing research in frequent pattern mining in data analysis, it will provide a strong foundation for data mining methodologies and its applications which might prove a milestone in data mining applications in mere future
- …