6 research outputs found

    Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

    Get PDF
    Due to its causal semantics, Bayesian networks (BN) have been widely employed to discover the underlying data relationship in exploratory studies, such as brain research. Despite its success in modeling the probability distribution of variables, BN is naturally a generative model, which is not necessarily discriminative. This may cause the ignorance of subtle but critical network changes that are of investigation values across populations. In this paper, we propose to improve the discriminative power of BN models for continuous variables from two different perspectives. This brings two general discriminative learning frameworks for Gaussian Bayesian networks (GBN). In the first framework, we employ Fisher kernel to bridge the generative models of GBN and the discriminative classifiers of SVMs, and convert the GBN parameter learning to Fisher kernel learning via minimizing a generalization error bound of SVMs. In the second framework, we employ the max-margin criterion and build it directly upon GBN models to explicitly optimize the classification performance of the GBNs. The advantages and disadvantages of the two frameworks are discussed and experimentally compared. Both of them demonstrate strong power in learning discriminative parameters of GBNs for neuroimaging based brain network analysis, as well as maintaining reasonable representation capacity. The contributions of this paper also include a new Directed Acyclic Graph (DAG) constraint with theoretical guarantee to ensure the graph validity of GBN.Comment: 16 pages and 5 figures for the article (excluding appendix

    Novel approaches for hierarchical classification with case studies in protein function prediction

    Get PDF
    A very large amount of research in the data mining, machine learning, statistical pattern recognition and related research communities has focused on flat classification problems. However, many problems in the real world such as hierarchical protein function prediction have their classes naturally organised into hierarchies. The task of hierarchical classification, however, needs to be better defined as researchers into one application domain are often unaware of similar efforts developed in other research areas. The first contribution of this thesis is to survey the task of hierarchical classification across different application domains and present an unifying framework for the task. After clearly defining the problem, we explore novel approaches to the task. Based on the understanding gained by surveying the task of hierarchical classification, there are three major approaches to deal with hierarchical classification problems. The first approach is to use one of the many existing flat classification algorithms to predict only the leaf classes in the hierarchy. Note that, in the training phase, this approach completely ignores the hierarchical class relationships, i.e. the parent-child and sibling class relationships, but in the testing phase the ancestral classes of an instance can be inferred from its predicted leaf classes. The second approach is to build a set of local models, by training one flat classification algorithm for each local view of the hierarchy. The two main variations of this approach are: (a) training a local flat multi-class classifier at each non-leaf class node, where each classifier discriminates among the child classes of its associated class; or (b) training a local fiat binary classifier at each node of the class hierarchy, where each classifier predicts whether or not a new instance has the classifier’s associated class. In both these variations, in the testing phase a procedure is used to combine the predictions of the set of local classifiers in a coherent way, avoiding inconsistent predictions. The third approach is to use a global-model hierarchical classification algorithm, which builds one single classification model by taking into account all the hierarchical class relationships in the training phase. In the context of this categorization of hierarchical classification approaches, the other contributions of this thesis are as follows. The second contribution of this thesis is a novel algorithm which is based on the local classifier per parent node approach. The novel algorithm is the selective representation approach that automatically selects the best protein representation to use at each non-leaf class node. The third contribution is a global-model hierarchical classification extension of the well known naive Bayes algorithm. Given the good predictive performance of the global-model hierarchical-classification naive Bayes algorithm, we relax the Naive Bayes’ assumption that attributes are independent from each other given the class by using the concept of k dependencies. Hence, we extend the flat classification /¿-Dependence Bayesian network classifier to the task of hierarchical classification, which is the fourth contribution of this thesis. Both the proposed global-model hierarchical classification Naive Bayes and the proposed global-model hierarchical /¿-Dependence Bayesian network classifier have achieved predictive accuracies that were, overall, significantly higher than the predictive accuracies obtained by their corresponding local hierarchical classification versions, across a number of datasets for the task of hierarchical protein function prediction

    Learning Bayesian networks based on optimization approaches

    Get PDF
    Learning accurate classifiers from preclassified data is a very active research topic in machine learning and artifcial intelligence. There are numerous classifier paradigms, among which Bayesian Networks are very effective and well known in domains with uncertainty. Bayesian Networks are widely used representation frameworks for reasoning with probabilistic information. These models use graphs to capture dependence and independence relationships between feature variables, allowing a concise representation of the knowledge as well as efficient graph based query processing algorithms. This representation is defined by two components: structure learning and parameter learning. The structure of this model represents a directed acyclic graph. The nodes in the graph correspond to the feature variables in the domain, and the arcs (edges) show the causal relationships between feature variables. A directed edge relates the variables so that the variable corresponding to the terminal node (child) will be conditioned on the variable corresponding to the initial node (parent). The parameter learning represents probabilities and conditional probabilities based on prior information or past experience. The set of probabilities are represented in the conditional probability table. Once the network structure is constructed, the probabilistic inferences are readily calculated, and can be performed to predict the outcome of some variables based on the observations of others. However, the problem of structure learning is a complex problem since the number of candidate structures grows exponentially when the number of feature variables increases. This thesis is devoted to the development of learning structures and parameters in Bayesian Networks. Different models based on optimization techniques are introduced to construct an optimal structure of a Bayesian Network. These models also consider the improvement of the Naive Bayes' structure by developing new algorithms to alleviate the independence assumptions. We present various models to learn parameters of Bayesian Networks; in particular we propose optimization models for the Naive Bayes and the Tree Augmented Naive Bayes by considering different objective functions. To solve corresponding optimization problems in Bayesian Networks, we develop new optimization algorithms. Local optimization methods are introduced based on the combination of the gradient and Newton methods. It is proved that the proposed methods are globally convergent and have superlinear convergence rates. As a global search we use the global optimization method, AGOP, implemented in the open software library GANSO. We apply the proposed local methods in the combination with AGOP. Therefore, the main contributions of this thesis include (a) new algorithms for learning an optimal structure of a Bayesian Network; (b) new models for learning the parameters of Bayesian Networks with the given structures; and finally (c) new optimization algorithms for optimizing the proposed models in (a) and (b). To validate the proposed methods, we conduct experiments across a number of real world problems. Print version is available at: http://library.federation.edu.au/record=b1804607~S4Doctor of Philosoph
    corecore