6 research outputs found

    Efficient algorithms for mining clickstream patterns using pseudo-IDLists

    Get PDF
    Sequential pattern mining is an important task in data mining. Its subproblem, clickstream pattern mining, is starting to attract more research due to the growth of the Internet and the need to analyze online customer behaviors. To date, only few works are dedicately proposed for the problem of mining clickstream patterns. Although one approach is to use the general algorithms for sequential pattern mining, those algorithms’ performance may suffer and the resources needed are more than would be necessary with a dedicated method for mining clickstreams. In this paper, we present pseudo-IDList, a novel data structure that is more suitable for clickstream pattern mining. Based on this structure, a vertical format algorithm named CUP (Clickstream pattern mining Using Pseudo-IDList) is proposed. Furthermore, we propose a pruning heuristic named DUB (Dynamic intersection Upper Bound) to improve our proposed algorithm. Four real-life clickstream databases are used for the experiments and the results show that our proposed methods are effective and efficient regarding runtimes and memory consumption. © 2020 Elsevier B.V.Vietnam National Foundation for Science and Technology Development (NAFOSTED)National Foundation for Science & Technology Development (NAFOSTED) [02/2019/TN

    Sequential Pattern Mining with Multidimensional Interval Items

    Get PDF
    In real sequence pattern mining scenarios, the interval information between two item sets is very important. However, although existing algorithms can effectively mine frequent subsequence sets, the interval information is ignored. This paper aims to mine sequential patterns with multidimensional interval items in sequence databases. In order to address this problem, this paper defines and specifies the interval event problem in the sequential pattern mining task. Then, the interval event items framework is proposed to handle the multidimensional interval event items. Moreover, the MII-Prefixspan algorithm is introduced for the sequential pattern with multidimensional interval event items mining tasks. This algorithm adds the processing of interval event items in the mining process. We can get richer and more in line with actual needs information from mined sequence patterns through these methods. This scheme is applied to the actual website behaviour analysis task to obtain more valuable information for web optimization and provide more valuable sequence pattern information for practical problems. This work also opens a new pathway toward more efficient sequential pattern mining tasks

    C3Ro: An efficient mining algorithm of extende d-close d contiguous robust sequential patterns in noisy data

    Get PDF
    International audienceSequential pattern mining has been the focus of many works, but still faces a tough challenge in the mining of large databases for both efficiency and apprehensibility of its resulting set. To overcome these issues, the most promising direction taken by the literature relies on the use of constraints, including the well-known closedness constraint. However, such a mining is not resistant to noise in data, a characteristic of most real-world data. The main research question raised in this paper is thus: how to efficiently mine an apprehensible set of sequential patterns from noisy data? In order to address this research question, we introduce 1) two original constraints designed for the mining of noisy data: the robustness and the extended-closedness constraints, 2) a generic pattern mining algorithm, C3Ro, designed to mine a wide range of sequential patterns, going from closed or maximal contiguous sequential patterns to closed or maximal regular sequential patterns. C3Ro is dedicated to practitioners and is able to manage their multiple constraints. C3Ro also is the first sequential pattern mining algorithm to be as generic and parameterizable. Extensive experiments have been conducted and reveal the high efficiency of C3Ro, especially in large datasets, over well-known algorithms from the literature. Additional experiments have been conducted on a real-world job offers noisy dataset, with the goal to mine activities. This experiment offers a more thorough insight into C3Ro algorithm: job market experts confirm that the constraints we introduced actually have a significant positive impact on the apprehensibility of the set of mined activities

    Mining clickstream patterns using idlists

    No full text
    To date, there remains a lack of works that focus on the problem of mining clickstream patterns. Although it is an alternative to use the general algorithms for sequential pattern mining to mine clickstreams, their performance may suffer and the resources needed are more than necessary. In this paper, we present a novel data structure, called index-IDList, that is suitable for clickstream pattern mining. Based on this data structure, we present a vertical format algorithm named CUI (Clickstream pattern mining Using Index-IDList). The experiments are carried out on four real-life clickstream databases and the results show that our proposed method is effective and efficient in terms of runtimes and memory consumption. © 2019 IEEE.Vietnam National Foundation for Science and Technology Development (NAFOSTED)National Foundation for Science & Technology Development (NAFOSTED); Ministry of Education, Youth and Sports of the Czech Republic within the National Sustainability Programme [LO1303 (MSMT-7778/2014)]; European Regional Development Fund under the Project CEBIA-Tech [CZ.1.05/2.1.00/03.0089]; COST (European Cooperation in Science Technology) [IC1406]; Faculty of Applied Informatics, Tomas Bata University in Zlin (ailab.fai.utb.cz

    Sequential pattern mining using IDLists

    No full text
    Sequential pattern mining is a practical problem whose objective is to discover helpful informative patterns in a stored database such as market transaction databases. It covers many applications in different areas. Recently, a study that improved the runtime for mining patterns was proposed. It was called pseudo-IDLists and it helps prevent duplicate data from replicating during the mining process. However, the idea only works for the special type of sequential patterns, which are clickstream patterns. Direct applying the idea for sequential pattern mining is not feasible. Hence, we proposed adaptions and changes to the novel idea and proposed SUI (Sequential pattern mining Using IDList), a sequential pattern mining algorithm based on pseudo-IDLists. Via experiments on three test databases, we show that SUI is efficient and effective regarding runtime and memory consumption. © 2020, Springer Nature Switzerland AG

    An efficient method for mining sequential patterns with indices

    No full text
    In recent years, mining informative data and discovering hidden information have become increasingly in demand. One of the popular means to achieve this is sequential pattern mining, which is to find informative patterns stored in databases. Its applications cover different areas and many methods have been proposed. Recently, pseudo-IDLists were proposed to improve both runtime and memory usage in the mining process. However, the idea cannot be directly used for sequential pattern mining as it only works on clickstream patterns, a more distinct type of sequential pattern. We propose adaptations and changes to the original idea to introduce SUI (Sequential pattern mining Using Indices). Comparing SUI with two other state-of-the-art algorithms on six test databases, we show that SUI has effective and efficient performance and memory usage. © 2021 Elsevier B.V.IGA/CebiaTech/2022/00
    corecore