3 research outputs found

    Efficient algorithms for mining clickstream patterns using pseudo-IDLists

    Get PDF
    Sequential pattern mining is an important task in data mining. Its subproblem, clickstream pattern mining, is starting to attract more research due to the growth of the Internet and the need to analyze online customer behaviors. To date, only few works are dedicately proposed for the problem of mining clickstream patterns. Although one approach is to use the general algorithms for sequential pattern mining, those algorithms’ performance may suffer and the resources needed are more than would be necessary with a dedicated method for mining clickstreams. In this paper, we present pseudo-IDList, a novel data structure that is more suitable for clickstream pattern mining. Based on this structure, a vertical format algorithm named CUP (Clickstream pattern mining Using Pseudo-IDList) is proposed. Furthermore, we propose a pruning heuristic named DUB (Dynamic intersection Upper Bound) to improve our proposed algorithm. Four real-life clickstream databases are used for the experiments and the results show that our proposed methods are effective and efficient regarding runtimes and memory consumption. © 2020 Elsevier B.V.Vietnam National Foundation for Science and Technology Development (NAFOSTED)National Foundation for Science & Technology Development (NAFOSTED) [02/2019/TN

    Sequential Pattern Mining with Multidimensional Interval Items

    Get PDF
    In real sequence pattern mining scenarios, the interval information between two item sets is very important. However, although existing algorithms can effectively mine frequent subsequence sets, the interval information is ignored. This paper aims to mine sequential patterns with multidimensional interval items in sequence databases. In order to address this problem, this paper defines and specifies the interval event problem in the sequential pattern mining task. Then, the interval event items framework is proposed to handle the multidimensional interval event items. Moreover, the MII-Prefixspan algorithm is introduced for the sequential pattern with multidimensional interval event items mining tasks. This algorithm adds the processing of interval event items in the mining process. We can get richer and more in line with actual needs information from mined sequence patterns through these methods. This scheme is applied to the actual website behaviour analysis task to obtain more valuable information for web optimization and provide more valuable sequence pattern information for practical problems. This work also opens a new pathway toward more efficient sequential pattern mining tasks

    An efficient method for mining sequential patterns with indices

    No full text
    In recent years, mining informative data and discovering hidden information have become increasingly in demand. One of the popular means to achieve this is sequential pattern mining, which is to find informative patterns stored in databases. Its applications cover different areas and many methods have been proposed. Recently, pseudo-IDLists were proposed to improve both runtime and memory usage in the mining process. However, the idea cannot be directly used for sequential pattern mining as it only works on clickstream patterns, a more distinct type of sequential pattern. We propose adaptations and changes to the original idea to introduce SUI (Sequential pattern mining Using Indices). Comparing SUI with two other state-of-the-art algorithms on six test databases, we show that SUI has effective and efficient performance and memory usage. © 2021 Elsevier B.V.IGA/CebiaTech/2022/00