109,341 research outputs found

    Learning Bayesian Networks with Incomplete Data by Augmentation

    Get PDF
    We present new algorithms for learning Bayesian networks from data with missing values using a data augmentation approach. An exact Bayesian network learning algorithm is obtained by recasting the problem into a standard Bayesian network learning problem without missing data. To the best of our knowledge, this is the first exact algorithm for this problem. As expected, the exact algorithm does not scale to large domains. We build on the exact method to create an approximate algorithm using a hill-climbing technique. This algorithm scales to large domains so long as a suitable standard structure learning method for complete data is available. We perform a wide range of experiments to demonstrate the benefits of learning Bayesian networks with such new approach

    Large-scale empirical validation of Bayesian Network structure learning algorithms with noisy data.

    Get PDF
    Numerous Bayesian Network (BN) structure learning algorithms have been proposed in the literature over the past few decades. Each publication makes an empirical or theoretical case for the algorithm proposed in that publication and results across studies are often inconsistent in their claims about which algorithm is ‘best’. This is partly because there is no agreed evaluation approach to determine their effectiveness. Moreover, each algorithm is based on a set of assumptions, such as complete data and causal sufficiency, and tend to be evaluated with data that conforms to these assumptions, however unrealistic these assumptions may be in the real world. As a result, it is widely accepted that synthetic performance overestimates real performance, although to what degree this may happen remains unknown. This paper investigates the performance of 15 state-of-the-art, well-established, or recent promising structure learning algorithms. We propose a methodology that applies the algorithms to data that incorporates synthetic noise, in an effort to better understand the performance of structure learning algorithms when applied to real data. Each algorithm is tested over multiple case studies, sample sizes, types of noise, and assessed with multiple evaluation criteria. This work involved learning approximately 10,000 graphs with a total structure learning runtime of seven months. In investigating the impact of data noise, we provide the first large scale empirical comparison of BN structure learning algorithms under different assumptions of data noise. The results suggest that traditional synthetic performance may overestimate real-world performance by anywhere between 10% and more than 50%. They also show that while score-based learning is generally superior to constraint-based learning, a higher fitting score does not necessarily imply a more accurate causal graph. The comparisons extend to other outcomes of interest, such as runtime, reliability, and resilience to noise, assessed over both small and large networks, and with both limited and big data. To facilitate comparisons with future studies, we have made all data, raw results, graphs and BN models freely available online

    Infinite Multiple Membership Relational Modeling for Complex Networks

    Get PDF
    Learning latent structure in complex networks has become an important problem fueled by many types of networked data originating from practically all fields of science. In this paper, we propose a new non-parametric Bayesian multiple-membership latent feature model for networks. Contrary to existing multiple-membership models that scale quadratically in the number of vertices the proposed model scales linearly in the number of links admitting multiple-membership analysis in large scale networks. We demonstrate a connection between the single membership relational model and multiple membership models and show on "real" size benchmark network data that accounting for multiple memberships improves the learning of latent structure as measured by link prediction while explicitly accounting for multiple membership result in a more compact representation of the latent structure of networks.Comment: 8 pages, 4 figure

    Reconstructing gene regulatory network using heterogeneous biological data

    Get PDF
    Gene regulatory network is a model of a network that describes the relationships among genes In a given condition. However, constructing gene regulatory network is a complicated task as high-throughput technologies generate large-scale of data compared to number of sample.In addition, the data involves a substantial amount of noise and false positive results that hinder the downstream analysis performance.To address these problems Bayesian network model has attracted the most attention. However, the key challenge in using Bayesian network to mode1 GRN is related to its learning structure.Bayesian network structure learning is NP-hard and computationally complex. Therefore. this research aims to address the issue related to Bayesian network structure learning by proposing a low-order conditional independence method.In addition we revised the gene regulatory relationships by integrating biological heterogeneous dataset to extract transcription factors for regulator, and target genes.The empirical results indicate that proposed method works better with biological knowledge processing with a precision of 83.3% in comparison to a network that rely on microarray only, which achieved correctness of 80.85

    Learning Large-Scale Bayesian Networks with the sparsebn Package

    Get PDF
    Learning graphical models from data is an important problem with wide applications, ranging from genomics to the social sciences. Nowadays datasets often have upwards of thousands---sometimes tens or hundreds of thousands---of variables and far fewer samples. To meet this challenge, we have developed a new R package called sparsebn for learning the structure of large, sparse graphical models with a focus on Bayesian networks. While there are many existing software packages for this task, this package focuses on the unique setting of learning large networks from high-dimensional data, possibly with interventions. As such, the methods provided place a premium on scalability and consistency in a high-dimensional setting. Furthermore, in the presence of interventions, the methods implemented here achieve the goal of learning a causal network from data. Additionally, the sparsebn package is fully compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure
    • …
    corecore