109,341 research outputs found
Learning Bayesian Networks with Incomplete Data by Augmentation
We present new algorithms for learning Bayesian networks from data with
missing values using a data augmentation approach. An exact Bayesian network
learning algorithm is obtained by recasting the problem into a standard
Bayesian network learning problem without missing data. To the best of our
knowledge, this is the first exact algorithm for this problem. As expected, the
exact algorithm does not scale to large domains. We build on the exact method
to create an approximate algorithm using a hill-climbing technique. This
algorithm scales to large domains so long as a suitable standard structure
learning method for complete data is available. We perform a wide range of
experiments to demonstrate the benefits of learning Bayesian networks with such
new approach
Large-scale empirical validation of Bayesian Network structure learning algorithms with noisy data.
Numerous Bayesian Network (BN) structure learning algorithms have been proposed in the literature over the past few decades. Each publication makes an empirical or theoretical case for the algorithm proposed in that publication and results across studies are often inconsistent in their claims about which algorithm is ‘best’. This is partly because there is no agreed evaluation approach to determine their effectiveness. Moreover, each algorithm is based on a set of assumptions, such as complete data and causal sufficiency, and tend to be evaluated with data that conforms to these assumptions, however unrealistic these assumptions may be in the real world. As a result, it is widely accepted that synthetic performance overestimates real performance, although to what degree this may happen remains unknown. This paper investigates the performance of 15 state-of-the-art, well-established, or recent promising structure learning algorithms. We propose a methodology that applies the algorithms to data that incorporates synthetic noise, in an effort to better understand the performance of structure learning algorithms when applied to real data. Each algorithm is tested over multiple case studies, sample sizes, types of noise, and assessed with multiple evaluation criteria. This work involved learning approximately 10,000 graphs with a total structure learning runtime of seven months. In investigating the impact of data noise, we provide the first large scale empirical comparison of BN structure learning algorithms under different assumptions of data noise. The results suggest that traditional synthetic performance may overestimate real-world performance by anywhere between 10% and more than 50%. They also show that while score-based learning is generally superior to constraint-based learning, a higher fitting score does not necessarily imply a more accurate causal graph. The comparisons extend to other outcomes of interest, such as runtime, reliability, and resilience to noise, assessed over both small and large networks, and with both limited and big data. To facilitate comparisons with future studies, we have made all data, raw results, graphs and BN models freely available online
Infinite Multiple Membership Relational Modeling for Complex Networks
Learning latent structure in complex networks has become an important problem
fueled by many types of networked data originating from practically all fields
of science. In this paper, we propose a new non-parametric Bayesian
multiple-membership latent feature model for networks. Contrary to existing
multiple-membership models that scale quadratically in the number of vertices
the proposed model scales linearly in the number of links admitting
multiple-membership analysis in large scale networks. We demonstrate a
connection between the single membership relational model and multiple
membership models and show on "real" size benchmark network data that
accounting for multiple memberships improves the learning of latent structure
as measured by link prediction while explicitly accounting for multiple
membership result in a more compact representation of the latent structure of
networks.Comment: 8 pages, 4 figure
Reconstructing gene regulatory network using heterogeneous biological data
Gene regulatory network is a model of a network that describes the relationships among genes In a given condition. However, constructing gene regulatory network is a complicated task as high-throughput technologies generate large-scale of data compared to number of sample.In addition, the data involves a substantial amount of noise and false positive results that hinder the downstream analysis performance.To address these problems Bayesian
network model has attracted the most attention. However, the key challenge in using Bayesian network to mode1 GRN is related to its learning structure.Bayesian network structure learning is NP-hard and computationally complex.
Therefore. this research aims to address the issue related to Bayesian network structure learning by proposing a low-order conditional independence method.In addition we revised the gene regulatory relationships by integrating
biological heterogeneous dataset to extract transcription factors for regulator, and target genes.The empirical results indicate that proposed method works better with biological knowledge processing with a precision of 83.3% in
comparison to a network that rely on microarray only, which achieved correctness of 80.85
Learning Large-Scale Bayesian Networks with the sparsebn Package
Learning graphical models from data is an important problem with wide
applications, ranging from genomics to the social sciences. Nowadays datasets
often have upwards of thousands---sometimes tens or hundreds of thousands---of
variables and far fewer samples. To meet this challenge, we have developed a
new R package called sparsebn for learning the structure of large, sparse
graphical models with a focus on Bayesian networks. While there are many
existing software packages for this task, this package focuses on the unique
setting of learning large networks from high-dimensional data, possibly with
interventions. As such, the methods provided place a premium on scalability and
consistency in a high-dimensional setting. Furthermore, in the presence of
interventions, the methods implemented here achieve the goal of learning a
causal network from data. Additionally, the sparsebn package is fully
compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure
Recommended from our members
Distributed Bayesian Computation and Self-Organized Learning in Sheets of Spiking Neurons with Local Lateral Inhibition
During the last decade, Bayesian probability theory has emerged as a framework in cognitive science and neuroscience for describing perception, reasoning and learning of mammals. However, our understanding of how probabilistic computations could be organized in the brain, and how the observed connectivity structure of cortical microcircuits supports these calculations, is rudimentary at best. In this study, we investigate statistical inference and self-organized learning in a spatially extended spiking network model, that accommodates both local competitive and large-scale associative aspects of neural information processing, under a unified Bayesian account. Specifically, we show how the spiking dynamics of a recurrent network with lateral excitation and local inhibition in response to distributed spiking input, can be understood as sampling from a variational posterior distribution of a well-defined implicit probabilistic model. This interpretation further permits a rigorous analytical treatment of experience-dependent plasticity on the network level. Using machine learning theory, we derive update rules for neuron and synapse parameters which equate with Hebbian synaptic and homeostatic intrinsic plasticity rules in a neural implementation. In computer simulations, we demonstrate that the interplay of these plasticity rules leads to the emergence of probabilistic local experts that form distributed assemblies of similarly tuned cells communicating through lateral excitatory connections. The resulting sparse distributed spike code of a well-adapted network carries compressed information on salient input features combined with prior experience on correlations among them. Our theory predicts that the emergence of such efficient representations benefits from network architectures in which the range of local inhibition matches the spatial extent of pyramidal cells that share common afferent input
- …