Search CORE

1 research outputs found

Query Based Learning Decision Tree and its Applications in Data Mining

Author: Lo Chia-Yen
羅嘉彥
Publication venue
Publication date: 26/11/2007
Field of study

動機：決策樹為資料探勘中常用的一種工具，它的好處在於其圖形輸出模式，不但讓人對決策過程一目了然，更能得到容易推論理解的探勘結果。但如同現有一般資料探勘演算法，當面對擁有巨量資料的實際問題時，決策樹的學習過程往往費時過多，因此如果能夠針對此項缺點加以修正，必定可以大幅改善決策樹的決策分類效率。作法：在之前研究中提出的詢問式學習法則，取法「因材施教」的原則，針對系統需要加強學習的地方給予訓練，當遇到大量資料的資料探勘問題時，能提供小型且富意義的資料集，提升探勘的速度。由於實際資料往往大量且含有雜訊，以至於學習上會有過度符合(overfit)或符合度不夠(underfit)的情形，而我們提出的方法可在學習效率與正確率方面同時得到改善。先前應用在分群與分類的兩種類神經網路上，得到不錯的結果。故本論文嘗試利用詢問式學習法則於決策樹，提出詢問式學習決策樹(Query-Based Learning Decision Tree, QBLDT)來克服決策樹的學習問題。成果：本論文為首度將詢問式學習觀念應用到決策樹，配合抽樣方法與資訊增益值(Information gain)來進行學習詢問，以加速學習速度與預測效果。實驗結果顯示結合詢問式學習法的決策樹在各項表現上，皆比傳統決策樹好，由此可以證明我們所提出的方法的確可以幫助決策樹提升學習效率，達到令人接受的結果。Motivation: Decision tree (DT) is one of the most significant classification methods applied in data mining. By its graphic output, users could have an easy way to interpret the decision flow and the mining outcome. However, the construction of DT is known to be time consuming. It will spend a high computation cost when mining the large scale dataset in real world. This drawback causes DT to be ineligible in processing the time critical applications. Method: In past years, we have introduced the query-based learning (QBL) method to different neural networks for providing a more effective way to achieve good clustering and classification results. We try to apply the QBL concept in DT construction and propose a novel mining scheme called QBLDT. Achievement: This thesis, in our knowledge, is the first study that applies the QBL concept in DT construction. Experimental results show our proposed QBLDT method is better than the traditional DT construction method in different performance metrics. It makes learning quicker and can achieve better prediction results.第一章緒論...............................................................................................................................1 1.1 前言..............................................................................................................................1 1.2 研究背景和動機..........................................................................................................2 1.3 論文架構......................................................................................................................3 第二章文獻探討.....................................................................................................................4 2.1資料探勘.......................................................................................................................4 2.2決策樹...........................................................................................................................8 2.3詢問式學習法.............................................................................................................12 2.4抽樣方法(Sampling Methods)....................................................................................14 第三章研究架構設計...........................................................................................................17 3.1 研究方法考量............................................................................................................17 3.2以不同的抽樣方法建構決策樹.................................................................................18 3.3詢問式決策樹(QBLDT).............................................................................................19 3.3.1 基本介紹.........................................................................................................19 3.3.2 詢問學習資料的準則.....................................................................................21 第四章實例驗證與應用.......................................................................................................22 4.1 實驗與評估方式說明...............................................................................................22 4.1.1實驗方式說明..................................................................................................22 4.1.2評估方式說明..................................................................................................22 4.2 測試資料說明...........................................................................................................23 4.2.1 dataset 1：Nursery.............................................................................................23 4.2.2 dataset 2：Car Evaluation.................................................................................24 4.3 實驗結果與比較.......................................................................................................24 4.3.1實驗結果說明與分析......................................................................................24 4.4 初步探討不同決策樹演算法影響............................................................................29 4.4.1 Gainratio演算法之決策樹..............................................................................29 4.4.2 GINI演算法之決策樹.....................................................................................30 第五章結論與展望...............................................................................................................34 5.1資料集的探討.............................................................................................................34 5.2 方法論的探討............................................................................................................35 參考資料...................................................................................................................................37 附錄相關方法列表.................................................................................................................4

National Taiwan University Repository