121 research outputs found

    A hybrid algorithm for Bayesian network structure learning with application to multi-label learning

    Get PDF
    We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC's ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author

    Fair Causal Feature Selection

    Full text link
    Causal feature selection has recently received increasing attention in machine learning. Existing causal feature selection algorithms select unique causal features of a class variable as the optimal feature subset. However, a class variable usually has multiple states, and it is unfair to select the same causal features for different states of a class variable. To address this problem, we employ the class-specific mutual information to evaluate the causal information carried by each state of the class attribute, and theoretically analyze the unique relationship between each state and the causal features. Based on this, a Fair Causal Feature Selection algorithm (FairCFS) is proposed to fairly identifies the causal features for each state of the class variable. Specifically, FairCFS uses the pairwise comparisons of class-specific mutual information and the size of class-specific mutual information values from the perspective of each state, and follows a divide-and-conquer framework to find causal features. The correctness and application condition of FairCFS are theoretically proved, and extensive experiments are conducted to demonstrate the efficiency and superiority of FairCFS compared to the state-of-the-art approaches

    Markov blanket: efficient strategy for feature subset selection method for high dimensionality microarray cancer datasets

    Get PDF
    Currently, feature subset selection methods are very important, especially in areas of application for which datasets with tens or hundreds of thousands of variables (genes) are available. Feature subset selection methods help us select a small number of variables out of thousands of genes in microarray datasets for a more accurate and balanced classification. Efficient gene selection can be considered as an easy computational hold of the subsequent classification task, and can give subset of gene set without the loss of classification performance. In classifying microarray data, the main objective of gene selection is to search for the genes while keeping the maximum amount of relevant information about the class and minimize classification errors. In this paper, explain the importance of feature subset selection methods in machine learning and data mining fields. Consequently, the analysis of microarray expression was used to check whether global biological differences underlie common pathological features in different types of cancer datasets and identify genes that might anticipate the clinical behavior of this disease. Using the feature subset selection model for gene expression contains large amounts of raw data that needs analyzing to obtain useful information for specific biological and medical applications. One way of finding relevant (and removing redundant ) genes is by using the Bayesian network based on the Markov blanket [1]. We present and compare the performance of the different approaches to feature (genes) subset selection methods based on Wrapper and Markov Blanket models for the five-microarray cancer datasets. The first way depends on the Memetic algorithms (MAs) used for the feature selection method. The second way uses MRMR (Minimum Redundant Maximum Relevant) for feature subset selection hybridized by genetic search optimization techniques and afterwards compares the Markov blanket model’s performance with the most common classical classification algorithms for the selected set of features. For the memetic algorithm, we present a comparison between two embedded approaches for feature subset selection which are the wrapper filter for feature selection algorithm (WFFSA) and Markov Blanket Embedded Genetic Algorithm (MBEGA). The memetic algorithm depends on genetic operators (crossover, mutation) and the dedicated local search procedure. For comparisons, we depend on two evaluations techniques for learning and testing data which are 10-Kfold cross validation and 30-Bootstraping. The results of the memetic algorithm clearly show MBEGA often outperforms WFFSA methods by yielding more significant differentiation among different microarray cancer datasets. In the second part of this paper, we focus mainly on MRMR for feature subset selection methods and the Bayesian network based on Markov blanket (MB) model that are useful for building a good predictor and defying the curse of dimensionality to improve prediction performance. These methods cover a wide range of concerns: providing a better definition of the objective function, feature construction, feature ranking, efficient search methods, and feature validity assessment methods as well as defining the relationships among attributes to make predictions. We present performance measures for some common (or classical) learning classification algorithms (Naive Bayes, Support vector machine [LiBSVM], K-nearest neighbor, and AdBoostM Ensampling) before and after using the MRMR method. We compare the Bayesian network classification algorithm based on the Markov Blanket model’s performance measure with the performance of these common classification algorithms. The result of performance measures for classification algorithm based on the Bayesian network of the Markov blanket model get higher accuracy rates than other types of classical classification algorithms for the cancer Microarray datasets. Bayesian networks clearly depend on relationships among attributes to make predictions. The Bayesian network based on the Markov blanket (MB) classification method of classifying variables provides all necessary information for predicting its value. In this paper, we recommend the Bayesian network based on the Markov blanket for learning and classification processing, which is highly effective and efficient on feature subset selection measures.Master of Science (MSc) in Computational Science

    Learning Patient-Specific Models From Clinical Data

    Get PDF
    A key purpose of building a model from clinical data is to predict the outcomes of future individual patients. This work introduces a Bayesian patient-specific predictive framework for constructing predictive models from data that are optimized to predict well for a particular patient case. The construction of such patient-specific models is influenced by the particular history, symptoms, laboratory results, and other features of the patient case at hand. This approach is in contrast to the commonly used population-wide models that are constructed to perform well on average on all future cases.The new patient-specific method described in this research uses Bayesian network models, carries out Bayesian model averaging over a set of models to predict the outcome of interest for the patient case at hand, and employs a patient-specific heuristic to locate a set of suitable models to average over. Two versions of the method are developed that differ in the representation used for the conditional probability distributions in the Bayesian networks. One version uses a representation that captures only the so called global structure among the variables of a Bayesian network and the second representation captures additional local structure among the variables. The patient-specific methods were experimentally evaluated on one synthetic dataset, 21 UCI datasets and three medical datasets. Their performance was measured using five different performance measures and compared to that of several commonly used methods for constructing predictive models including naïve Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor and Lazy Bayesian Rules. Over all the datasets, both patient-specific methods performed better on average on all performance measures and against all the comparison algorithms. The global structure method that performs Bayesian model averaging in conjunction with the patient-specific search heuristic had better performance than either model selection with the patient-specific heuristic or non-patient-specific Bayesian model averaging. However, the additional learning of local structure by the local structure method did not lead to significant improvements over the use of global structure alone. The specific implementation limitations of the local structure method may have limited its performance

    Generalized Bayesian Network Classifiers

    Get PDF

    Medical data mining using Bayesian network and DNA sequence analysis.

    Get PDF
    Lee Kit Ying.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 115-117).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Project Background --- p.1Chapter 1.2 --- Problem Specifications --- p.3Chapter 1.3 --- Contributions --- p.5Chapter 1.4 --- Thesis Organization --- p.6Chapter 2 --- Background --- p.8Chapter 2.1 --- Medical Data Mining --- p.8Chapter 2.1.1 --- General Information --- p.9Chapter 2.1.2 --- Related Research --- p.10Chapter 2.1.3 --- Characteristics and Difficulties Encountered --- p.11Chapter 2.2 --- DNA Sequence Analysis --- p.13Chapter 2.3 --- Hepatitis B Virus --- p.14Chapter 2.3.1 --- Virus Characteristics --- p.15Chapter 2.3.2 --- Important Findings on the Virus --- p.17Chapter 2.4 --- Bayesian Network and its Classifiers --- p.17Chapter 2.4.1 --- Formal Definition --- p.18Chapter 2.4.2 --- Existing Learning Algorithms --- p.19Chapter 2.4.3 --- Evolutionary Algorithms and Hybrid EP (HEP) --- p.22Chapter 2.4.4 --- Bayesian Network Classifiers --- p.25Chapter 2.4.5 --- Learning Algorithms for BN Classifiers --- p.32Chapter 3 --- Bayesian Network Classifier for Clinical Data --- p.35Chapter 3.1 --- Related Work --- p.36Chapter 3.2 --- Proposed BN-augmented Naive Bayes Classifier (BAN) --- p.38Chapter 3.2.1 --- Definition --- p.38Chapter 3.2.2 --- Learning Algorithm with HEP --- p.39Chapter 3.2.3 --- Modifications on HEP --- p.39Chapter 3.3 --- Proposed General Bayesian Network with Markov Blan- ket (GBN) --- p.40Chapter 3.3.1 --- Definition --- p.41Chapter 3.3.2 --- Learning Algorithm with HEP --- p.41Chapter 3.4 --- Findings on Bayesian Network Parameters Calculation --- p.43Chapter 3.4.1 --- Situation and Errors --- p.43Chapter 3.4.2 --- Proposed Solution --- p.46Chapter 3.5 --- Performance Analysis on Proposed BN Classifier Learn- ing Algorithms --- p.47Chapter 3.5.1 --- Experimental Methodology --- p.47Chapter 3.5.2 --- Benchmark Data --- p.48Chapter 3.5.3 --- Clinical Data --- p.50Chapter 3.5.4 --- Discussion --- p.55Chapter 3.6 --- Summary --- p.56Chapter 4 --- Classification in DNA Analysis --- p.57Chapter 4.1 --- Related Work --- p.58Chapter 4.2 --- Problem Definition --- p.59Chapter 4.3 --- Proposed Methodology Architecture --- p.60Chapter 4.3.1 --- Overall Design --- p.60Chapter 4.3.2 --- Important Components --- p.62Chapter 4.4 --- Clustering --- p.63Chapter 4.5 --- Feature Selection Algorithms --- p.65Chapter 4.5.1 --- Information Gain --- p.66Chapter 4.5.2 --- Other Approaches --- p.67Chapter 4.6 --- Classification Algorithms --- p.67Chapter 4.6.1 --- Naive Bayes Classifier --- p.68Chapter 4.6.2 --- Decision Tree --- p.68Chapter 4.6.3 --- Neural Networks --- p.68Chapter 4.6.4 --- Other Approaches --- p.69Chapter 4.7 --- Important Points on Evaluation --- p.69Chapter 4.7.1 --- Errors --- p.70Chapter 4.7.2 --- Independent Test --- p.70Chapter 4.8 --- Performance Analysis on Classification of DNA Data --- p.71Chapter 4.8.1 --- Experimental Methodology --- p.71Chapter 4.8.2 --- Using Naive-Bayes Classifier --- p.73Chapter 4.8.3 --- Using Decision Tree --- p.73Chapter 4.8.4 --- Using Neural Network --- p.74Chapter 4.8.5 --- Discussion --- p.76Chapter 4.9 --- Summary --- p.77Chapter 5 --- Adaptive HEP for Learning Bayesian Network Struc- ture --- p.78Chapter 5.1 --- Background --- p.79Chapter 5.1.1 --- Objective --- p.79Chapter 5.1.2 --- Related Work - AEGA --- p.79Chapter 5.2 --- Feasibility Study --- p.80Chapter 5.3 --- Proposed A-HEP Algorithm --- p.82Chapter 5.3.1 --- Structural Dissimilarity Comparison --- p.82Chapter 5.3.2 --- Dynamic Population Size --- p.83Chapter 5.4 --- Evaluation on Proposed Algorithm --- p.88Chapter 5.4.1 --- Experimental Methodology --- p.89Chapter 5.4.2 --- Comparison on Running Time --- p.93Chapter 5.4.3 --- Comparison on Fitness of Final Network --- p.94Chapter 5.4.4 --- Comparison on Similarity to the Original Network --- p.95Chapter 5.4.5 --- Parameter Study --- p.96Chapter 5.5 --- Applications on Medical Domain --- p.100Chapter 5.5.1 --- Discussion --- p.100Chapter 5.5.2 --- An Example --- p.101Chapter 5.6 --- Summary --- p.105Chapter 6 --- Conclusion --- p.107Chapter 6.1 --- Summary --- p.107Chapter 6.2 --- Future Work --- p.109Bibliography --- p.11

    ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES

    Get PDF
    Bayesian networks (BNs) are highly practical and successful tools for modeling probabilistic knowledge. They can be constructed by an expert, learned from data, or by a combination of the two. A popular approach to learning the structure of a BN is the constraint-based search (CBS) approach, with the PC algorithm being a prominent example. In recent years, we have been experiencing a data deluge. We have access to more data, big and small, than ever before. The exponential nature of BN algorithms, however, hinders large-scale analysis. Developments in parallel and distributed computing have made the computational power required for large-scale data processing widely available, yielding opportunities for developing parallel and distributed algorithms for BN learning and inference. In this dissertation, (1) I propose two MapReduce versions of the PC algorithm, aimed at solving an increasingly common case: data is not necessarily massive in the number of records, but more and more so in the number of variables. (2) When the number of data records is small, the PC algorithm experiences problems in independence testing. Empirically, I explore a contradiction in the literature on how to resolve the case of having insufficient data when testing the independence of two variables: declare independence or dependence. (3) When BNs learned from data become complex in terms of graph density, they may require more parameters than we can feasibly store. I propose and evaluate five approaches to pruning a BN structure to guarantee that it will be tractable for storage and inference. I follow this up by proposing three approaches to improving the classification accuracy of a BN by modifying its structure
    • …
    corecore