Search CORE

2 research outputs found

MOGA-based fuzzy data mining with taxonomy

Author: [[corresponding]]Hong Tzung-Pei
Chen Chun-Hao
Publication venue: 'Elsevier BV'
Publication date
Field of study

[[abstract]]Transactions in real-world applications usually consist of quantitative values. Some fuzzy data mining approaches have thus been proposed for deriving linguistic rules from such transactions. Since membership functions may have a critical influence on the final mining results, several genetic-fuzzy mining approaches have been proposed for mining appropriate membership functions and fuzzy association rules at the same time. Most of them, however, focus on a single level and consider only one objective function. This paper proposes a multi-objective multi-level genetic-fuzzy mining (MOMLGFM) algorithm for mining a set of non-dominated membership functions for mining multi-level fuzzy association rules. The algorithm first encodes the membership functions of each item class (category) into a chromosome according to the given taxonomy. Two objective functions are then considered. The first one is the knowledge amount mined out at different levels, and the second one is the suitability of membership functions. The fitness value of each individual is then evaluated using these two objective functions. After the evolutionary process terminates, various sets of membership functions can be used for deriving multi-level fuzzy association rules according to decision-makers. Experimental results on the simulated and real datasets show the effectiveness of the proposed algorithm.[[notice]]補正完畢[[journaltype]]國外[[incitationindex]]SCI[[ispeerreviewed]]Y[[booktype]]紙本[[countrycodes]]NL

Tamkang University Institutional Repository

[[alternative]]多目標為基礎的階層式遺傳模糊探勘技術

Author: 何吉軒
Publication venue
Publication date
Field of study

碩士[[abstract]]現實世界的交易資料通常包含購買數量，故許多因應此類型交易資料的模糊探勘方法被提出並用來挖掘模糊關聯規則。因為隸屬函數對最終探勘結果有重大影響，所以許多遺傳模糊探勘方法進一步被提出用來同時探勘隸屬函數與模糊關聯規則。然而，大部分的方法都專注於單一階層探勘並且只考慮一個目標函數。有鑑於此，本論文提出兩個方法來探勘柏拉圖集合(隸屬函數)並挖掘多階層模糊關聯規則，分別為多目標多階層遺傳模糊探勘方法(MOMLGFM)與兩階段多目標遺傳模糊探勘方法(TMOGFM)。在第一個方法(MOMLGFM)，首先會根據給定的分類階層將商品類別的隸屬函數編成染色體。方法中考慮兩個目標函數。第一個目標函數是不同階層探勘出的資訊總和，第二個目標函數是染色體中隸屬函數的適合度。接著，每個個體的適性函數值則由這兩個目標函數計算而得。在演化流程完成後，多種隸屬函數則可根據決策者的喜好用來探勘多階層模糊關聯規則。然而，MOMLGFM找出柏拉圖集合後，決策者會有難以從中挑選出適合的隸屬函數進行規則探勘的困擾。所以在第二個方法(TMOGFM)，我們以第一個演算法為基礎提出兩階段多目標模糊探勘演算法來幫助決策者選擇恰當的隸屬函數。在第一階段，使用MOMLGFM挖掘隸屬函數的柏拉圖集合。在第二階段，依據設計的規則或利潤導向分群屬性，透過分群技術將柏拉圖解分成不同群組並找出群組代表解。之後，依據決策者的喜好，每個群組內所選出的代表解則能用來探勘模糊關聯規則或利潤模糊商品集。實驗部分透過模擬資料與一個真實資料的實驗結果顯示MOMLGFM與TMOGFM是有效的。MOMLGFM的優點在能同時挖掘柏拉圖集合(隸屬函數集合)與多階層模糊關聯規則。TMOGFM的優點是不僅能挖掘隸屬函數集合，並且能挖掘各群組內的代表解並用於探勘多階層模糊關聯規則與利潤模糊商品集。[[abstract]]Transactions in real-world applications usually consist of quantitative values. Some fuzzy data mining approaches have thus been proposed for deriving linguistic rules from this kind of transactions. Since membership functions may have a critical influence on final mining results, several genetic-fuzzy mining approaches have then been proposed as well for mining appropriate membership functions and fuzzy association rules at the same time. Most of them, however, focus on single-level concept and consider only one objective function. In view of this, this thesis proposes two approaches for mining the Pareto set (a set of non-dominated membership functions) and multi-level fuzzy association rules, namely a Multi-Objective Multi-Level Genetic-Fuzzy Mining Algorithm (MOMLGFM) and a Two-Stage Multi-Objective Fuzzy Mining Algorithm (TMOGFM). In the first algorithm (MOMLGFM), it first encodes the membership functions of each item class (category) into a chromosome according to the given taxonomy. Two objective functions are then considered. The first one is the knowledge amount mined out in different concept levels, and the second one is the suitability of membership functions. The fitness value of each individual is then evaluated by these two objective functions. After the MOGA process terminates, various sets of membership functions could be used for deriving multi-level fuzzy association rules according to decision makers’ preferences. However, the derived Pareto set by MOMLGFM may be not easy for users to choose an appropriate one for mining rules. In the second algorithm (TMOGFM), based on MOMLGFM, a two-stage multi-objective fuzzy mining algorithm is proposed for assisting decision makers to choose the proper solution. In the first stage, the MOMLGFM is used to derive a set of non-dominated membership functions (Pareto solutions). Then, in second stage, according to the designed rule-oriented or utility-oriented clustering attributes, the clustering technique is utilized to divide the Pareto solutions into groups and find representative solution of each group. The representative solutions of groups could be employed to mine fuzzy association rules or utility fuzzy itemsets according to the favorites of decision makers. Experimental results on simulation datasets and a real dataset also show the effectiveness of MOMLGFM and TMOGFM. The advantage of MOMLGFM is that it can derive Pareto set (a set of membership functions) and multi-level fuzzy association rules, simultaneously. The advantage of TMOGFM is that it can not only mine the Pareto set, but also use the representative solutions of groups to acquire multi-level fuzzy association rules and utility fuzzy itemsets.[[tableofcontents]]Contents CHAPTER 1 INTRODUCTION 1 1.1 Problem Definition and Motivation 1 1.2 Contributions 3 1.3 Reader''s Guide 3 CHAPTER 2 REVIEW OF RELATED WORK 4 2.1 The MOGA-based Optimization Problems 4 2.2 Genetic-Fuzzy Mining Techniques 6 2.3 Binary and Fuzzy Data Mining Approaches 7 2.4 Review of Utility Fuzzy Itemset Mining Approaches 9 CHAPTER 3 MOGA-BASED FUZZY DATA MINING WITH TAXONOMY 11 3.1 The MOGA-based Multi-Level Fuzzy-Data Mining Framework 11 3.2 Components of Proposed Approach 13 3.2.1 Chromosome Representation 13 3.2.2 Initial Population 14 3.2.3 The Two Objective Functions 15 3.2.4 Fitness Assignment 16 3.2.5 Genetic Operators 19 3.3 The Proposed Mining Algorithm 20 3.4 An Example 23 CHAPTER 4 A TWO-STAGE MULTI-OBJECTIVE FUZZY MINING ALGORITHM 32 4.1 The Proposed two-stage Fuzzy Data Mining Framework 32 4.2 The Objective Functions and Clustering Attributes of the Proposed Approach 34 4.2.1 The Objective Functions in The First Stage 34 4.2.2 The Clustering Attributes in the Second Stage 37 4.3 A Two-Stage Multi-Objective Fuzzy Mining Algorithm 38 4.4 An Example 40 CHAPTER 5 EXPERIMENTAL RESULTS 44 5.1 Experimental Results for Method (Ⅰ) 44 5.1.1 Dataset Descriptions 44 5.1.2 Experimental Evaluations 44 5.2 Experimental Results for Method (Ⅱ) 51 5.2.1 Dataset Descriptions 51 5.2.2 The Experimental Results on simulated datasets 51 5.2.2.1 The Evolution of the Pareto Front 51 5.2.2.2 The Evaluation of the Clustering Results 52 5.2.2.3 Analyses of the Derived Fuzzy Rules and Utility Fuzzy Itemsets 57 5.2.2.4 The Execution Time of TMOGFM 59 5.2.3 The Experimental Results on foodmart dataset 60 5.2.3.1 The Evolution of the Pareto Fronts 60 5.2.3.2 The Evaluation of the Clustering Results 61 5.2.3.3 Analyses of the Derived Fuzzy Rules 64 CHAPTER 6 CONCLUSIONS AND FUTURE WORKS 66 References 68 APPENDIXES: ENGLISH PAPER 72 List of Figures Figure 1. An example for the Pareto optimal solutions 5 Figure 2. The MOGA-based multi-level fuzzy-data mining framework 12 Figure 3. Membership functions of item class ICj 13 Figure 4. The ranking results of the ten chromosomes 17 Figure 5. The results of assign fitness of the ten chromosomes 18 Figure 6. The average fitness values of the ten chromosomes 19 Figure 7. Predefined taxonomy 24 Figure 8. The two-stage fuzzy data mining framework 33 Figure 9. The evolution of the Pareto fronts of different generation 45 Figure 10. The membership functions with maximum number of large 1-itemsets 46 Figure 11. The membership functions with minimum suitability 46 Figure 12. The membership functions between two extreme cases 47 Figure 13. The execution times of the proposed approach and previous one 49 Figure 14. The Pareto fronts of the proposed approach with different population sizes 50 Figure 15. The evolution of the Pareto fronts of different generation 52 Figure 16. The relationships between the average similarity of clustering results and the number of clusters 54 Figure 17. The clustering results of Combination I with k is equal to 6 55 Figure 18. The clustering results of Combination II with k is equal to 3 55 Figure 19. The representative chromosomes by Combination I 56 Figure 20. The representative chromosomes by Combination II 57 Figure 21. The evolution of the initial and final Pareto fronts 60 Figure 22. The relationships between the average similarity of clustering results and the number of clusters on foodmart dataset 61 Figure 23. The clustering results of Combination I with k is equal to 6 62 Figure 24. The clustering results of Combination II with k is equal to 3 62 Figure 25. The representative chromosomes by Combination I 63 Figure 26. The representative chromosomes by Combination II 64 List of Tables Table 1. The six transactions in the example 24 Table 2. Encoded taxonomy 25 Table 3. Translated dataset 25 Table 4. Level-1 representation in the example 26 Table 5. Level-1 fuzzy set 27 Table 6. The counts of the fuzzy regions 27 Table 7. The suitability value and the numLarLevel of each chromosome 28 Table 8. The ranking results of all the ten chromosomes 29 Table 9. The fitness values of all the ten chromosomes 29 Table 10. The resulting fitness values of the ten chromosomes 30 Table 11. External Utility 40 Table 12. Fuzzy closed itemsets in all levels 41 Table 13. The UFI of all large itemsets in level2 in C1 41 Table 14. The UFI of all levels in C1 42 Table 15. Total UFI and Suitability of all chromosomes 42 Table 16. Normalized suitability and UFI of all chromosomes 42 Table 17. Clustering results 43 Table 18. FCLI and normalized UFI in level 1 in representative chromosome C3 43 Table 19. The derived fuzzy association rules with highest confidence values 47 Table 20. The comparison results of MOMLGFM and MLGFM 48 Table 21. The derived number of fuzzy rules at different levels of selected representative chromosomes on the simulated dataset 58 Table 22. The derived number of utility closed fuzzy itemsets at different levels of representative chromosomes on the simulated dataset 58 Table 23. Execution times on simulated datasets with different transaction sizes 59 Table 24. The derived number of fuzzy rules at different levels of selected representative chromosomes on foodmart dataset 65[[note]]學號: 600411861, 學年度: 10

Tamkang University Institutional Repository