64,988 research outputs found
Near-Optimal Multi-Perturbation Experimental Design for Causal Structure Learning
Causal structure learning is a key problem in many domains. Causal structures
can be learnt by performing experiments on the system of interest. We address
the largely unexplored problem of designing a batch of experiments that each
simultaneously intervene on multiple variables. While potentially more
informative than the commonly considered single-variable interventions,
selecting such interventions is algorithmically much more challenging, due to
the doubly-exponential combinatorial search space over sets of composite
interventions. In this paper, we develop efficient algorithms for optimizing
different objective functions quantifying the informativeness of a
budget-constrained batch of experiments. By establishing novel submodularity
properties of these objectives, we provide approximation guarantees for our
algorithms. Our algorithms empirically perform superior to both random
interventions and algorithms that only select single-variable interventions.Comment: 10 pages, 2 figures, appendix, to be published in 35th Conference on
Neural Information Processing Systems (NeurIPS 2021), fixed typos and
clarified wordin
Stochastic Training of Neural Networks via Successive Convex Approximations
This paper proposes a new family of algorithms for training neural networks
(NNs). These are based on recent developments in the field of non-convex
optimization, going under the general name of successive convex approximation
(SCA) techniques. The basic idea is to iteratively replace the original
(non-convex, highly dimensional) learning problem with a sequence of (strongly
convex) approximations, which are both accurate and simple to optimize.
Differently from similar ideas (e.g., quasi-Newton algorithms), the
approximations can be constructed using only first-order information of the
neural network function, in a stochastic fashion, while exploiting the overall
structure of the learning problem for a faster convergence. We discuss several
use cases, based on different choices for the loss function (e.g., squared loss
and cross-entropy loss), and for the regularization of the NN's weights. We
experiment on several medium-sized benchmark problems, and on a large-scale
dataset involving simulated physical data. The results show how the algorithm
outperforms state-of-the-art techniques, providing faster convergence to a
better minimum. Additionally, we show how the algorithm can be easily
parallelized over multiple computational units without hindering its
performance. In particular, each computational unit can optimize a tailored
surrogate function defined on a randomly assigned subset of the input
variables, whose dimension can be selected depending entirely on the available
computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and
Learning System
- …