6 research outputs found

    Classification with Single Constraint Progressive Mining of Sequential Patterns

    Get PDF
    Classification based on sequential pattern data has become an important topic to explore. One of research has been carried was the Classify-By-Sequence, CBS. CBS classified data based on sequential patterns obtained from AprioriLike sequential pattern mining. Sequential patterns obtained were called CSP, Classifiable Sequential Patterns. CSP was used as classifier rules or features for the classification task. CBS used AprioriLike algorithm to search for sequential patterns. However, AprioriLike algorithm took a long time to search for them. Moreover, not all sequential patterns were important for the user. In order to get the right and meaningful features for classification, user uses a constraint in sequential pattern mining. Constraint is also expected to reduce the number of sequential patterns that are short and less meaningful to the user. Therefore, we developed CBS_CLASS* with Single Constraint Progressive Mining of Sequential Patterns or Single Constraint PISA or PISA*. CBS_Class* with PISA* was proven to classify data in faster time since it only processed lesser number of sequential patterns but still conform to user’s need. The experiment result showed that compared to CBS_CLASS, CBS_Class* reduced the classification execution time by 89.8%. Moreover, the accuracy of the classification process can still be maintained.

    A General Model for Sequential Pattern Mining with a Progressive Database

    No full text

    Frequent and Sequential Pattern Mining with Period of Interest Awareness

    No full text
    在此論文中,我們討論了具興趣時段感知之頻繁與循序樣式資料探勘的問題。我們發現使用者對較新的資料比過去的資料更感興趣。若是能考慮使用者的興趣時段,我們便能夠獲得在交易資料庫中最有趣的頻繁樣式或在序列資料庫中的循序樣式。們探討了在時間資料庫中發掘相關性的通用模型,在這模型中資料的生存週期可以允許有所不同。為了解決這個問題,我們提出了一種有效的演算法Twain,以便找出頻繁樣式更為精確的頻繁時段。Twain不僅能產生頻繁模式更精確的頻繁時段,也發現了更多有趣的頻繁模式。外,我們提出了一個循序樣式探勘中的通用模型,處理的資料庫為漸進式的資料庫,而資料庫中的資料可能是靜態的、可被新增的或可被刪除的。此外,我們也提出了一個漸進式的演算法Pisa,逐步在使用者的興趣時段中找尋循序樣式。Pisa採用了一個漸進式循序樹,能夠有效地保留最新的資料序列,並產生最新且完整的循序樣式,同時刪除過時的資料和相對應的循序樣式。後,我們討論了在前述通用模型中必定存在的可擴展性問題。當資料庫擁有越來越多的序列或使用者的興趣時段加大時,用來處理漸進式循序樣式的時間和空間會急劇增加。由於在單一處理器上計算能力和工作空間有限,通常很難不停地擴大。因此,我們設計了一個分散式的演算法DPSP以處理大量的資料。在每一個時間點,DPSP能夠刪除過時的資料、更新目前的循序樣式和產生在最新的興趣時段中頻繁出現的循序樣式。In this dissertation, we addressed the frequent and sequential pattern mining problem with period of interest awareness. It is noted that users are usually more interested in recent data than old ones. Taking the period of interest into consideration, we are able to derive most interesting frequent patterns in time domain in a transaction database or sequential patterns in a sequence database.e first explored the general model of mining associations in a temporal database, where exhibition periods of items are allowed to be different from one to another. To address this issue, we proposed an efficient algorithm Twain, standing for TWo end AssocIation miNer to give more precise frequent exhibition periods of frequent temporal itemsets. Twain not only generates frequent patterns with more precise frequent exhibition periods, but also discovers more interesting frequent patterns.e also proposed a general model of sequential pattern mining with a progressive database while the data in the database may be static, inserted or deleted. In addition, we presented a progressive algorithm Pisa, standing for Progressive mIning of Sequential pAtterns, to progressively discover sequential patterns in a defined period of interest. Pisa utilizes a progressive sequential tree to efficiently maintain the latest data sequences, discover the complete set of up-to-date sequential patterns, and delete obsolete data and patterns accordingly.n addition, we examined the intrinsic scalability problem of mining progressive sequential patterns. When the number of sequences grows and the POI becomes larger, the time and space used to conduct progressive sequential patterns increases dramatically. Due to the limited computing power and working space, single processors usually struggle to scale up. Therefore, we designed a distributed algorithm DPSP, standing for Distributed Progressive Sequential Pattern mining algorithm, to deal with large amounts of data. At each timestamp, DPSP is able to delete obsolete itemsets, update current candidate sequential patterns and report up-to-date frequent sequential patterns within the current POI.1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Overview of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Two-End Association Miner with Precise Frequent Exhibition Periods . . . . . 3.2.2 A General Model for Sequential Pattern Mining with a Progressive Database . 3.2.3 Distributed Progressive SequentialPatternMiningontheCloud . . . . . . . . 4.3 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Two-End Association Miner with Precise Frequent Exhibition Periods 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.2 RelatedWorks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2.3 AprioriIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2.4 SP F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3 Twain for Precise General Temporal Association . . . . . . . . . . . . . . . . . . . . . 24.3.1 Algorithm T wain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.3.2 On Detailed Operations of Twain . . . . . . . . . . . . . . . . . . . . . . . . 28.3.3 Correctness of Algorithm Twain . . . . . . . . . . . . . . . . . . . . . . . . . 31.3.4 Incremental Ability of Twain . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35.4.1 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35.4.2 ExecutionTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36.4.3 I/O Costs and CPU Overheads . . . . . . . . . . . . . . . . . . . . . . . . . . 37.4.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 A General Model for Sequential Pattern Mining with a Progressive Database 41.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46.2.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47.2.2 RelatedWorks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48.2.3 Comparative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49.3 Progressive mIning of Sequential pAtterns . . . . . . . . . . . . . . . . . . . . . . . . 52.3.1 PS-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.3.2 Algorithm Pisa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.3.3 On Maintaining the PS-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 60.3.4 Fast Pisa with Approximate Results . . . . . . . . . . . . . . . . . . . . . . . 61.3.5 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65.4.1 ExperimentDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66.4.2 Cumulative Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 68.4.3 The Effects of the Input Parameters . . . . . . . . . . . . . . . . . . . . . . . 69.4.4 Scalability Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70.4.5 Space Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4.6 Practicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.4.7 The Benefits of Fast Version of Pisa . . . . . . . . . . . . . . . . . . . . . . . 73.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Distributed Progressive Sequential Pattern Mining on the Cloud 76.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80.2.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80.2.2 DDM and Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.3 RelatedWorks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Distributed Progressive Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . 83.3.1 Candidate Computing Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86.3.2 Support Assembling Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.4.1 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.4.2 Distributed Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Conclusions 97ibliography 9
    corecore