130 research outputs found

    Sequential PAttern mining using a bitmap representation

    Get PDF

    Directed Graph based Distributed Sequential Pattern Mining Using Hadoop MapReduce

    Get PDF
    Usual sequential pattern mining algorithms experiences the scalability problem when trade with very big data sets. In existing systems like PrefixSpan, UDDAG major time is needed to generate projected databases like prefix and suffix projected database from given sequential database. In DSPM (Distributed Sequential Pattern Mining) Directed Graph is introduced to generate prefix and suffix projected database which reduces the execution time for scanning large database. In UDDAG, for each unique id UDDAG is created to find next level sequential patterns. So it requires maximum storage for each UDDAG. In DSPM single directed graph is used to generate projected database and finding patterns. To improve the scanning time and scalability problem we introduce a distributed sequential pattern mining algorithm on Hadoop platform using MapReduce programming model. We use transformed database to reduce scanning time and directed graph to optimize the memory storage. Mapper is used to construct prefix and suffix projected databases for each length-1 frequent item parallel. The Reducer combines all intermediary outcomes to get final sequential patterns. Experiment results are compared against UDDAG, different values of minimum support, different massive data sets and with and without Hadoop platform which improves the scaling and speed performances. Experimental results show that DSPM using Hadoop MapReduce solves the scaling problem as well as storage problem of UDDAG. DOI: 10.17762/ijritcc2321-8169.15020

    A Survey Paper on Sequence Pattern Mining with Incremental Approach

    Get PDF
    Sequential pattern mining finds frequently occurring patterns ordered by time. The problem was first introduced by Agrawal and Srikant [1]. An example of a sequential pattern is “A customer who purchased a new Ford Explorer two years ago, is likely to respond favourably to a trade-in option now”. Let X be the clause “purchased a new Ford Explorer” and Y be the clause “responds favourably to a trade-in”. Then notice that the pattern XY above, is different from pattern YX which states that “A customer who responded favourably to a trade-in two years ago, will purchase a Ford Explorer now”. The order in which X and Y appear is important, and hence XY and YX are mined as two separate patterns.Sequential pattern mining is widely applicable since many types of data have a time component to them. For example, it can be used in the medical domain to help determine a correct diagnosis from the sequence of symptoms experienced; over customer data to help target repeat customers; and with web-log data to better structure a company’s website for easier access to the most popular links[2]

    Generation of Two-Voice Imitative Counterpoint from Statistical Models

    Get PDF
    Generating new music based on rules of counterpoint has been deeply studied in music informatics. In this article, we try to go further, exploring a method for generating new music based on the style of Palestrina, based on combining statistical generation and pattern discovery. A template piece is used for pattern discovery, and the patterns are selected and organized according to a probabilistic distribution, using horizontal viewpoints to describe melodic properties of events. Once the template is covered with patterns, two-voice counterpoint in a florid style is generated into those patterns using a first-order Markov model. The template method solves the problem of coherence and imitation never addressed before in previous research in counterpoint music generation. For constructing the Markov model, vertical slices of pitch and rhythm are compiled over a large corpus of dyads from Palestrina masses. The template enforces different restrictions that filter the possible paths through the generation process. A double backtracking algorithm is implemented to handle cases where no solutions are found at some point within a generation path. Results are evaluated by both information content and listener evaluation, and the paper concludes with a proposed relationship between musical quality and information content. Part of this research has been presented at SMC 2016 in Hamburg, Germany

    An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming

    Full text link
    The main advantage of Constraint Programming (CP) approaches for sequential pattern mining (SPM) is their modularity, which includes the ability to add new constraints (regular expressions, length restrictions, etc). The current best CP approach for SPM uses a global constraint (module) that computes the projected database and enforces the minimum frequency; it does this with a filtering algorithm similar to the PrefixSpan method. However, the resulting system is not as scalable as some of the most advanced mining systems like Zaki's cSPADE. We show how, using techniques from both data mining and CP, one can use a generic constraint solver and yet outperform existing specialized systems. This is mainly due to two improvements in the module that computes the projected frequencies: first, computing the projected database can be sped up by pre-computing the positions at which an symbol can become unsupported by a sequence, thereby avoiding to scan the full sequence each time; and second by taking inspiration from the trailing used in CP solvers to devise a backtracking-aware data structure that allows fast incremental storing and restoring of the projected database. Detailed experiments show how this approach outperforms existing CP as well as specialized systems for SPM, and that the gain in efficiency translates directly into increased efficiency for other settings such as mining with regular expressions.Comment: frequent sequence mining, constraint programmin

    Efficient Mining of Sequential Patterns in a Sequence Database with Weight Constraint

    Get PDF
    Sequence pattern mining is one of the essential data mining tasks with broad applications. Many sequence mining algorithms have been developed to find a set of frequent sub-sequences satisfying the support threshold in a sequence database. The main problem in most of these algorithms is they generate huge number of sequential patterns when the support threshold is low and all the sequence patterns are treated uniformly while real sequential patterns have different importance. In this paper, we propose an algorithm which aims to find more interesting sequential patterns, considering the different significance of each data element in a sequence database. Unlike the conventional weighted sequential pattern mining, where the weights of items are preassigned according to the priority or importance, in our approach the weights are set according to the real data and during the mining process not only the supports but also weights of patterns are considered. The experimental results show that the algorithm is efficient and effective in generating more interesting patterns

    A Novel Approach for Scalability a Two Way Sequential Pattern Mining using UDDAG

    Get PDF
    Traditional pattern growth-based approaches for sequential pattern mining derive length- (k + 1) patterns based on the projected databases of length-k patterns recursively. At each level of recursion, they unidirectionally grow the length of detected patterns by one along the suffix of detected patterns, which needs k levels of recursion to find a length-k pattern. In this paper, a novel data structure, UpDown Directed Acyclic Graph (UDDAG), is invented for efficient sequential pattern mining. UDDAG allows bidirectional pattern growth along both ends of detected patterns. Thus, a length-k pattern can be detected in | log2 k + 1| levels of recursion at best, which results in fewer levels of recursion and faster pattern growth. When minSup is large such that the average pattern length is close to 1, UDDAG and PrefixSpan have similar performance because the problem degrades into frequent item counting problem. However, UDDAG scales up much better. It often outperforms PrefixSpan by almost one order of magnitude in scalability tests. UDDAG is also considerably faster than Spade and LapinSpam. Except for extreme cases, UDDAG uses comparable memory to that of PrefixSpan and less memory than Spade and LapinSpam. Additionally, the special feature of UDDAG enables its extension toward applications involving searching in large spaces
    • …
    corecore