6,164 research outputs found

    A New Hierarchical Redundancy Eliminated Tree Augmented Naive Bayes Classifier for Coping with Gene Ontology-based Features

    Get PDF
    The Tree Augmented Naive Bayes classifier is a type of probabilistic graphical model that can represent some feature dependencies. In this work, we propose a Hierarchical Redundancy Eliminated Tree Augmented Naive Bayes (HRE-TAN) algorithm, which considers removing the hierarchical redundancy during the classifier learning process, when coping with data containing hierarchically structured features. The experiments showed that HRE-TAN obtains significantly better predictive performance than the conventional Tree Augmented Naive Bayes classifier, and enhanced the robustness against imbalanced class distributions, in aging-related gene datasets with Gene Ontology terms used as features.Comment: International Conference on Machine Learning (ICML 2016) Computational Biology Worksho

    Comparative Analysis of Naive Bayes and Tree Augmented Naive Bayes Models

    Get PDF
    Naive Bayes and Tree Augmented Naive Bayes (TAN) are probabilistic graphical models usedfor modeling huge datasets involving lots of uncertainties among its various interdependentfeature sets. Some of the most common applications of these models are image segmentation,medical diagnosis and various other data clustering and data classification applications. Aclassification problem deals with identifying to which category a particular instance belongs to,based on previous knowledge acquired by analysis of various such instances. The instances aredescribed using a set of variables called attributes or features. A Naive Bayes model assumes thatall the attributes of an instance are independent of each other given the class of that instance.This is a very simple representation of the system, but the independence assumptions made inthis model are incorrect and unrealistic. The TAN model improves on the Naive Bayes model byadding one more level of interaction among attributes of the system. In the TAN model, everyattribute is dependent on its class and one other attribute from the feature set. Since this modelincorporates the dependencies among the attributes, it is more realistic than a Naive Bayesmodel. This project analyzes the performance of these two models on various datasets. The TANmodel gives better performance results if there are correlations between the attributes but theperformance is almost the same as that of Naive Bayes model, if there are not enoughcorrelations between the attributes of the system

    Evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm

    Full text link
    The Markov Blanket Bayesian Classifier is a recently-proposed algorithm for construction of probabilistic classifiers. This paper presents an empirical comparison of the MBBC algorithm with three other Bayesian classifiers: Naive Bayes, Tree-Augmented Naive Bayes and a general Bayesian network. All of these are implemented using the K2 framework of Cooper and Herskovits. The classifiers are compared in terms of their performance (using simple accuracy measures and ROC curves) and speed, on a range of standard benchmark data sets. It is concluded that MBBC is competitive in terms of speed and accuracy with the other algorithms considered.Comment: 9 pages: Technical Report No. NUIG-IT-011002, Department of Information Technology, National University of Ireland, Galway (2002

    On the use of Bayesian network classifiers to classify patients with peptic ulcer among upper gastrointestinal bleeding patients

    Get PDF
    A Bayesian network classifier is one type of graphical probabilistic models that is capable of representing relationship between variables in a given domain under study. We consider the naive Bayes, tree augmented naive Bayes (TAN) and boosted augmented naive Bayes (BAN) to classify patients with peptic ulcer disease among upper gastro intestinal bleeding patients. We compare their performance with IBk and C4.5. To identify relevant variables for peptic ulcer disease, we use some methodologies for attributes subset selection. Results show that, blood urea nitrogen, hemoglobin and gastric malignancy are important for classification. BAN achieves the best accuracy of 77.3 and AUC of (0.81) followed by TAN with 72.4 and 0.76 respectively among Bayesian classifiers. While the accuracy of the TAN is improved with attribute selection, the BAN and IBK are better off without attribute selection

    Effect of missing value methods on Bayesian network classification of hepatitis data

    Get PDF
    Missing value imputation methods are widely used in solving missing value problems during statistical analysis. For classification tasks, these imputation methods can affect the accuracy of the Bayesian network classifiers. This paper study’s the effect of missing value treatment on the prediction accuracy of four Bayesian network classifiers used to predict death in acute chronic Hepatitis patients. Missing data was imputed using nine methods which include, replacing with most common attribute,support vector machine imputation (SVMI), K-nearest neighbor (KNNI), Fuzzy K-means Clustering (FKMI), K-means Clustering Imputation (KMI), Weighted imputation with K-Nearest Neighbor (WKNNI), regularized expectation maximization (EM), singular value decomposition (SVDI), and local least squares imputation (LLSI). The classification accuracy of the naive Bayes (NB), tree augmented naive Bayes (TAN), boosted augmented naive Bayes (BAN) and general Bayes network classifiers (GBN)were recorded. The SVMI and LLSI methods improved the classification accuracy of the classifiers. The method of ignoring missing values was better than seven of the imputation methods. Among the classifiers, the TAN achieved the best average classification accuracy of 86.3% followed by BAN with 85.1%

    Studi Klasifikasi dengan Bayesian Belief Networks Menggunakan Naive Bayes Classifier dan Tree Augmented Naive Bayes Classifier

    Get PDF
    ABSTRAKSI: Data Mining merupakan ekstraksi informasi potensial yang terkandung secara implisit pada database. Salah satu task pada data mining yang menjadi pokok perhatian dalam Tugas Akhir ini adalah klasifikasi, khususnya teknik bayesian yang sedang berkembang yaitu Bayesian Belief Networks (BBN).Bayesian Belief Networks (BBN) merupakan graf asiklik berarah yang simpul-simpulnya mewakili variabel-variabel pada dataset dan busur-busurnya mewakili relasi ketergantungan antar variabel dan distribusi probabilitas lokal untuk masing-masing variabel yang diberikan oleh orang tuanya.Tugas Akhir ini menganalisis performansi Naïve Bayes classifier dan Tree Augmented Naïve Bayes (TAN) classifier sebagai teknik klasifikasi BBN yang menggunakan restricted structure learning serta mengimplementasikannya untuk menyelesaikan persoalan klasifikasi dalam data mining.Hasilnya, TAN classifier menunjukkan performansi yang lebih baik daripada Naive Bayes classifier dalam hal akurasi walaupun dari segi kecepatan pembangunan model klasifikasi membutuhkan waktu yang lebih lama.Kata Kunci : Bayesian Belief Networks, TAN, Naive Bayes, classifier, klasifikasi,ABSTRACT: Data mining is an extraction of potential information implicitly from a database. One of many tasks in data mining that would be the subject of this final project is classification, especially Bayesian Belief Networks (BBN).Bayesian Belief Networks (BBN) is a directed acyclic graph whose nodes represent variables and arcs represent statistical dependence relations among the variables and local probability distributions for each variable given values of its parents.This final project analyzes the performance of Naïve Bayes classifier and Tree Augmented Naïve Bayes classifier as classification technique of BBN which use restricted structure learning and implement these classifiers to solve classification problems in data mining.As the result, it had been proved that TAN classifier performance better than Naïve Bayes classifier in accuracy although for construct classification model need longer time.Keyword: Bayesian Belief Networks, TAN, Naive Bayes, classifier

    Bayesian approach to classification of football match outcome

    Get PDF
    The football match outcome prediction particularly has gained popularity in recent years. It attract lots type of fan from the analyst expert, managerial of football team and others to predict the football match result before the match start.There are three types of approaches had been proposed to predict win, lose or draw; and evaluate the attributes of the football team. The approaches are statistical approach, machine learningapproach and Bayesian approach. This paper propose the Bayesian approaches within machine learning approaches such as Naive Bayes (NB), Tree Augmented Naive Bayes (TAN) and General Bayesian Network (K2) to predict the football match outcome. The required of football data is the English Premier League match results for three seasons; 2016 – 2017, 2015 – 2016 and 2014 – 2015 downloaded from http://www.football-data.co.uk. The experimental results showed that TAN achieved the highest predictive accuracy of 90.0 % in average across three seasons among others Bayesian approach (K2 and NB). The result from this research is hope that it can be used in future research for predicting the football match outcome

    KLASIFIKASI DATA DENGAN MENGGUNAKAN ALGORITMA C4.5 DAN TAN (TREE AUGMENTED NAIVE BAYES) <br> DATA CLASSIFCATION USING C4.5 AND TAN (TREE AUGMENTED NAIVE BAYES) ALGORITHMS

    Get PDF
    ABSTRAKSI: Data mining merupakan serangkaian proses untuk menggali nilai tambah dari suatu kumpulan data berupa pengetahuan yang selama ini tidak diketahui. Ada beberapa task dalam data mining, salah satunya adalah klasifikasi. Dalam Tugas Akhir ini akan digunakan metode klasifikasi C4.5 dan salah satu metode dalam Bayesian Network, yaitu Tree Augmented Naive Bayes (TAN). Algoritma C4.5 menggambarkan suatu distribusi joint probability dari sebuah set atribut. TAN merupakan graf asiklik berarah yang node-nodenya merepresentasikan variable pada data set sedangkan busur-busurnya (arc) merepresentasikan relasi ketergantungan diantara variable tersebut, dan algoritma C4.5+TAN adalah penggabungan kedua fungsionalitas di atas. Tugas Akhir ini bertujuan untuk menganalisis performansi waktu klasifikasi dan akurasi, serta bentuk pohon keputusan dari gabungan algoritma C4.5 dan TAN Classifier yang pemodelannya dibangun menggunakan algoritma C4.5 dan conditional independence test.Kata Kunci : Kata Kunci: C4.5, Conditional independence test, TAN, data mining, klasifikasi.ABSTRACT: Data mining is a process to find out the potential of information implicitly from database which unknown identifier before. One of many tasks in data mining that would be the subject of this final project is classification. The subjects of this final project are C4.5 Tree Augmented Naive Bayes (TAN) classifier. TAN is a directed acyclic graph whose nodes represent variables and arcs represent statistical dependence relations among the variables and local probability distributions for each variable given values of its parents. This final project analyzes the performance and accuracy of Naïve Bayes classifier and Tree Augmented Naïve Bayes classifier as classification technique of BN which build using conditional independence test based algorithms.Keyword: Keywords: C4.5, Conditional independence test, TAN, data mining, klasifikasi
    corecore