45,546 research outputs found
Who Learns Better Bayesian Network Structures: Accuracy and Speed of Structure Learning Algorithms
Three classes of algorithms to learn the structure of Bayesian networks from
data are common in the literature: constraint-based algorithms, which use
conditional independence tests to learn the dependence structure of the data;
score-based algorithms, which use goodness-of-fit scores as objective functions
to maximise; and hybrid algorithms that combine both approaches.
Constraint-based and score-based algorithms have been shown to learn the same
structures when conditional independence and goodness of fit are both assessed
using entropy and the topological ordering of the network is known (Cowell,
2001).
In this paper, we investigate how these three classes of algorithms perform
outside the assumptions above in terms of speed and accuracy of network
reconstruction for both discrete and Gaussian Bayesian networks. We approach
this question by recognising that structure learning is defined by the
combination of a statistical criterion and an algorithm that determines how the
criterion is applied to the data. Removing the confounding effect of different
choices for the statistical criterion, we find using both simulated and
real-world complex data that constraint-based algorithms are often less
accurate than score-based algorithms, but are seldom faster (even at large
sample sizes); and that hybrid algorithms are neither faster nor more accurate
than constraint-based algorithms. This suggests that commonly held beliefs on
structure learning in the literature are strongly influenced by the choice of
particular statistical criteria rather than just by the properties of the
algorithms themselves.Comment: 27 pages, 8 figure
Efficient Optimization of Echo State Networks for Time Series Datasets
Echo State Networks (ESNs) are recurrent neural networks that only train
their output layer, thereby precluding the need to backpropagate gradients
through time, which leads to significant computational gains. Nevertheless, a
common issue in ESNs is determining its hyperparameters, which are crucial in
instantiating a well performing reservoir, but are often set manually or using
heuristics. In this work we optimize the ESN hyperparameters using Bayesian
optimization which, given a limited budget of function evaluations, outperforms
a grid search strategy. In the context of large volumes of time series data,
such as light curves in the field of astronomy, we can further reduce the
optimization cost of ESNs. In particular, we wish to avoid tuning
hyperparameters per individual time series as this is costly; instead, we want
to find ESNs with hyperparameters that perform well not just on individual time
series but rather on groups of similar time series without sacrificing
predictive performance significantly. This naturally leads to a notion of
clusters, where each cluster is represented by an ESN tuned to model a group of
time series of similar temporal behavior. We demonstrate this approach both on
synthetic datasets and real world light curves from the MACHO survey. We show
that our approach results in a significant reduction in the number of ESN
models required to model a whole dataset, while retaining predictive
performance for the series in each cluster
Penalized Estimation of Directed Acyclic Graphs From Discrete Data
Bayesian networks, with structure given by a directed acyclic graph (DAG),
are a popular class of graphical models. However, learning Bayesian networks
from discrete or categorical data is particularly challenging, due to the large
parameter space and the difficulty in searching for a sparse structure. In this
article, we develop a maximum penalized likelihood method to tackle this
problem. Instead of the commonly used multinomial distribution, we model the
conditional distribution of a node given its parents by multi-logit regression,
in which an edge is parameterized by a set of coefficient vectors with dummy
variables encoding the levels of a node. To obtain a sparse DAG, a group norm
penalty is employed, and a blockwise coordinate descent algorithm is developed
to maximize the penalized likelihood subject to the acyclicity constraint of a
DAG. When interventional data are available, our method constructs a causal
network, in which a directed edge represents a causal relation. We apply our
method to various simulated and real data sets. The results show that our
method is very competitive, compared to many existing methods, in DAG
estimation from both interventional and high-dimensional observational data.Comment: To appear in Statistics and Computin
- …