41,857 research outputs found

    Bayesian networks for classification, clustering, and high-dimensional data visualisation

    Get PDF
    This thesis presents new developments for a particular class of Bayesian networks which are limited in the number of parent nodes that each node in the network can have. This restriction yields structures which have low complexity (number of edges), thus enabling the formulation of optimal learning algorithms for Bayesian networks from data. The new developments are focused on three topics: classification, clustering, and high-dimensional data visualisation (topographic map formation). For classification purposes, a new learning algorithm for Bayesian networks is introduced which generates simple Bayesian network classifiers. This approach creates a completely new class of networks which previously was limited mostly to two well known models, the naive Bayesian (NB) classifier and the Tree Augmented Naive Bayes (TAN) classifier. The proposed learning algorithm enhances the NB model by adding a Bayesian monitoring system. Therefore, the complexity of the resulting network is determined according to the input data yielding structures which model the data distribution in a more realistic way which improves the classification performance. Research on Bayesian networks for clustering has not been as popular as for classification tasks. A new unsupervised learning algorithm for three types of Bayesian network classifiers, which enables them to carry out clustering tasks, is introduced. The resulting models can perform cluster assignments in a probabilistic way using the posterior probability of a data point belonging to one of the clusters. A key characteristic of the proposed clustering models, which traditional clustering techniques do not have, is the ability to show the probabilistic dependencies amongst the variables for each cluster. This feature enables a better understanding of each cluster. The final part of this thesis introduces one of the first developments for Bayesian networks to perform topographic mapping. A new unsupervised learning algorithm for the NB model is presented which enables the projection of high-dimensional data into a two-dimensional space for visualisation purposes. The Bayesian network formalism of the model allows the learning algorithm to generate a density model of the input data and the presence of a cost function to monitor the convergence during the training process. These important features are limitations which other mapping techniques have and which have been overcome in this research

    PAC-Bayes and Domain Adaptation

    Get PDF
    We provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different, but related, target distribution. Firstly, we propose an improvement of the previous approach we proposed in Germain et al. (2013), which relies on a novel distribution pseudodistance based on a disagreement averaging, allowing us to derive a new tighter domain adaptation bound for the target risk. While this bound stands in the spirit of common domain adaptation works, we derive a second bound (introduced in Germain et al., 2016) that brings a new perspective on domain adaptation by deriving an upper bound on the target risk where the distributions' divergence-expressed as a ratio-controls the trade-off between a source error measure and the target voters' disagreement. We discuss and compare both results, from which we obtain PAC-Bayesian generalization bounds. Furthermore, from the PAC-Bayesian specialization to linear classifiers, we infer two learning algorithms, and we evaluate them on real data.Comment: Neurocomputing, Elsevier, 2019. arXiv admin note: substantial text overlap with arXiv:1503.0694

    Algorithm selection on data streams

    Get PDF
    We explore the possibilities of meta-learning on data streams, in particular algorithm selection. In a first experiment we calculate the characteristics of a small sample of a data stream, and try to predict which classifier performs best on the entire stream. This yields promising results and interesting patterns. In a second experiment, we build a meta-classifier that predicts, based on measurable data characteristics in a window of the data stream, the best classifier for the next window. The results show that this meta-algorithm is very competitive with state of the art ensembles, such as OzaBag, OzaBoost and Leveraged Bagging. The results of all experiments are made publicly available in an online experiment database, for the purpose of verifiability, reproducibility and generalizability

    Evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm

    Full text link
    The Markov Blanket Bayesian Classifier is a recently-proposed algorithm for construction of probabilistic classifiers. This paper presents an empirical comparison of the MBBC algorithm with three other Bayesian classifiers: Naive Bayes, Tree-Augmented Naive Bayes and a general Bayesian network. All of these are implemented using the K2 framework of Cooper and Herskovits. The classifiers are compared in terms of their performance (using simple accuracy measures and ROC curves) and speed, on a range of standard benchmark data sets. It is concluded that MBBC is competitive in terms of speed and accuracy with the other algorithms considered.Comment: 9 pages: Technical Report No. NUIG-IT-011002, Department of Information Technology, National University of Ireland, Galway (2002
    corecore