104,480 research outputs found

    Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review

    Get PDF
    The paper characterizes classes of functions for which deep learning can be exponentially better than shallow learning. Deep convolutional networks are a special case of these conditions, though weight sharing is not the main reason for their exponential advantage

    Approximate Bayesian Computation in State Space Models

    Full text link
    A new approach to inference in state space models is proposed, based on approximate Bayesian computation (ABC). ABC avoids evaluation of the likelihood function by matching observed summary statistics with statistics computed from data simulated from the true process; exact inference being feasible only if the statistics are sufficient. With finite sample sufficiency unattainable in the state space setting, we seek asymptotic sufficiency via the maximum likelihood estimator (MLE) of the parameters of an auxiliary model. We prove that this auxiliary model-based approach achieves Bayesian consistency, and that - in a precise limiting sense - the proximity to (asymptotic) sufficiency yielded by the MLE is replicated by the score. In multiple parameter settings a separate treatment of scalar parameters, based on integrated likelihood techniques, is advocated as a way of avoiding the curse of dimensionality. Some attention is given to a structure in which the state variable is driven by a continuous time process, with exact inference typically infeasible in this case as a result of intractable transitions. The ABC method is demonstrated using the unscented Kalman filter as a fast and simple way of producing an approximation in this setting, with a stochastic volatility model for financial returns used for illustration

    Algorithms and lower bounds for de Morgan formulas of low-communication leaf gates

    Get PDF
    The class FORMULA[s]GFORMULA[s] \circ \mathcal{G} consists of Boolean functions computable by size-ss de Morgan formulas whose leaves are any Boolean functions from a class G\mathcal{G}. We give lower bounds and (SAT, Learning, and PRG) algorithms for FORMULA[n1.99]GFORMULA[n^{1.99}]\circ \mathcal{G}, for classes G\mathcal{G} of functions with low communication complexity. Let R(k)(G)R^{(k)}(\mathcal{G}) be the maximum kk-party NOF randomized communication complexity of G\mathcal{G}. We show: (1) The Generalized Inner Product function GIPnkGIP^k_n cannot be computed in FORMULA[s]GFORMULA[s]\circ \mathcal{G} on more than 1/2+ε1/2+\varepsilon fraction of inputs for s=o ⁣(n2(k4kR(k)(G)log(n/ε)log(1/ε))2). s = o \! \left ( \frac{n^2}{ \left(k \cdot 4^k \cdot {R}^{(k)}(\mathcal{G}) \cdot \log (n/\varepsilon) \cdot \log(1/\varepsilon) \right)^{2}} \right). As a corollary, we get an average-case lower bound for GIPnkGIP^k_n against FORMULA[n1.99]PTFk1FORMULA[n^{1.99}]\circ PTF^{k-1}. (2) There is a PRG of seed length n/2+O(sR(2)(G)log(s/ε)log(1/ε))n/2 + O\left(\sqrt{s} \cdot R^{(2)}(\mathcal{G}) \cdot\log(s/\varepsilon) \cdot \log (1/\varepsilon) \right) that ε\varepsilon-fools FORMULA[s]GFORMULA[s] \circ \mathcal{G}. For FORMULA[s]LTFFORMULA[s] \circ LTF, we get the better seed length O(n1/2s1/4log(n)log(n/ε))O\left(n^{1/2}\cdot s^{1/4}\cdot \log(n)\cdot \log(n/\varepsilon)\right). This gives the first non-trivial PRG (with seed length o(n)o(n)) for intersections of nn half-spaces in the regime where ε1/n\varepsilon \leq 1/n. (3) There is a randomized 2nt2^{n-t}-time #\#SAT algorithm for FORMULA[s]GFORMULA[s] \circ \mathcal{G}, where t=Ω(nslog2(s)R(2)(G))1/2.t=\Omega\left(\frac{n}{\sqrt{s}\cdot\log^2(s)\cdot R^{(2)}(\mathcal{G})}\right)^{1/2}. In particular, this implies a nontrivial #SAT algorithm for FORMULA[n1.99]LTFFORMULA[n^{1.99}]\circ LTF. (4) The Minimum Circuit Size Problem is not in FORMULA[n1.99]XORFORMULA[n^{1.99}]\circ XOR. On the algorithmic side, we show that FORMULA[n1.99]XORFORMULA[n^{1.99}] \circ XOR can be PAC-learned in time 2O(n/logn)2^{O(n/\log n)}

    Variational Principle of Bogoliubov and Generalized Mean Fields in Many-Particle Interacting Systems

    Full text link
    The approach to the theory of many-particle interacting systems from a unified standpoint, based on the variational principle for free energy is reviewed. A systematic discussion is given of the approximate free energies of complex statistical systems. The analysis is centered around the variational principle of N. N. Bogoliubov for free energy in the context of its applications to various problems of statistical mechanics and condensed matter physics. The review presents a terse discussion of selected works carried out over the past few decades on the theory of many-particle interacting systems in terms of the variational inequalities. It is the purpose of this paper to discuss some of the general principles which form the mathematical background to this approach, and to establish a connection of the variational technique with other methods, such as the method of the mean (or self-consistent) field in the many-body problem, in which the effect of all the other particles on any given particle is approximated by a single averaged effect, thus reducing a many-body problem to a single-body problem. The method is illustrated by applying it to various systems of many-particle interacting systems, such as Ising and Heisenberg models, superconducting and superfluid systems, strongly correlated systems, etc. It seems likely that these technical advances in the many-body problem will be useful in suggesting new methods for treating and understanding many-particle interacting systems. This work proposes a new, general and pedagogical presentation, intended both for those who are interested in basic aspects, and for those who are interested in concrete applications.Comment: 60 pages, Refs.25

    Approximating multivariate posterior distribution functions from Monte Carlo samples for sequential Bayesian inference

    Full text link
    An important feature of Bayesian statistics is the opportunity to do sequential inference: the posterior distribution obtained after seeing a dataset can be used as prior for a second inference. However, when Monte Carlo sampling methods are used for inference, we only have a set of samples from the posterior distribution. To do sequential inference, we then either have to evaluate the second posterior at only these locations and reweight the samples accordingly, or we can estimate a functional description of the posterior probability distribution from the samples and use that as prior for the second inference. Here, we investigated to what extent we can obtain an accurate joint posterior from two datasets if the inference is done sequentially rather than jointly, under the condition that each inference step is done using Monte Carlo sampling. To test this, we evaluated the accuracy of kernel density estimates, Gaussian mixtures, vine copulas and Gaussian processes in approximating posterior distributions, and then tested whether these approximations can be used in sequential inference. In low dimensionality, Gaussian processes are more accurate, whereas in higher dimensionality Gaussian mixtures or vine copulas perform better. In our test cases, posterior approximations are preferable over direct sample reweighting, although joint inference is still preferable over sequential inference. Since the performance is case-specific, we provide an R package mvdens with a unified interface for the density approximation methods

    On the Quantitative Hardness of CVP

    Full text link
    \newcommand{\eps}{\varepsilon} \newcommand{\problem}[1]{\ensuremath{\mathrm{#1}} } \newcommand{\CVP}{\problem{CVP}} \newcommand{\SVP}{\problem{SVP}} \newcommand{\CVPP}{\problem{CVPP}} \newcommand{\ensuremath}[1]{#1} For odd integers p1p \geq 1 (and p=p = \infty), we show that the Closest Vector Problem in the p\ell_p norm (\CVP_p) over rank nn lattices cannot be solved in 2^{(1-\eps) n} time for any constant \eps > 0 unless the Strong Exponential Time Hypothesis (SETH) fails. We then extend this result to "almost all" values of p1p \geq 1, not including the even integers. This comes tantalizingly close to settling the quantitative time complexity of the important special case of \CVP_2 (i.e., \CVP in the Euclidean norm), for which a 2n+o(n)2^{n +o(n)}-time algorithm is known. In particular, our result applies for any p=p(n)2p = p(n) \neq 2 that approaches 22 as nn \to \infty. We also show a similar SETH-hardness result for \SVP_\infty; hardness of approximating \CVP_p to within some constant factor under the so-called Gap-ETH assumption; and other quantitative hardness results for \CVP_p and \CVPP_p for any 1p<1 \leq p < \infty under different assumptions
    corecore