11,438 research outputs found

    Information-based complexity, feedback and dynamics in convex programming

    Get PDF
    We study the intrinsic limitations of sequential convex optimization through the lens of feedback information theory. In the oracle model of optimization, an algorithm queries an {\em oracle} for noisy information about the unknown objective function, and the goal is to (approximately) minimize every function in a given class using as few queries as possible. We show that, in order for a function to be optimized, the algorithm must be able to accumulate enough information about the objective. This, in turn, puts limits on the speed of optimization under specific assumptions on the oracle and the type of feedback. Our techniques are akin to the ones used in statistical literature to obtain minimax lower bounds on the risks of estimation procedures; the notable difference is that, unlike in the case of i.i.d. data, a sequential optimization algorithm can gather observations in a {\em controlled} manner, so that the amount of information at each step is allowed to change in time. In particular, we show that optimization algorithms often obey the law of diminishing returns: the signal-to-noise ratio drops as the optimization algorithm approaches the optimum. To underscore the generality of the tools, we use our approach to derive fundamental lower bounds for a certain active learning problem. Overall, the present work connects the intuitive notions of information in optimization, experimental design, estimation, and active learning to the quantitative notion of Shannon information.Comment: final version; to appear in IEEE Transactions on Information Theor

    Guaranteeing Generalization via Measures of Information

    Get PDF
    During the past decade, machine learning techniques have achieved impressive results in a number of domains. Many of the success stories have made use of deep neural networks, a class of functions that boasts high complexity. Classical results that mathematically guarantee that a learning algorithm generalizes, i.e., performs as well on unseen data as on training data, typically rely on bounding the complexity and expressiveness of the functions that are used. As a consequence of this, they yield overly pessimistic results when applied to modern machine learning algorithms, and fail to explain why they generalize.This discrepancy between theoretical explanations and practical success has spurred a flurry of research activity into new generalization guarantees. For such guarantees to be applicable for relevant cases such as deep neural networks, they must rely on some other aspect of learning than the complexity of the function class. One avenue that is showing promise is to use methods from information theory. Since information-theoretic quantities are concerned with properties of different data distributions and relations between them, such an approach enables generalization guarantees that rely on the properties of learning algorithms and data distributions.In this thesis, we first introduce a framework to derive information-theoretic guarantees for generalization. Specifically, we derive an exponential inequality that can be used to obtain generalization guarantees not only in the average sense, but also tail bounds for the PAC-Bayesian and single-draw scenarios. This approach leads to novel generalization guarantees and provides a unified method for deriving several known generalization bounds that were originally discovered through the use of a number of different proof techniques. Furthermore, we extend this exponential-inequality approach to the recently introduced random-subset setting, in which the training data is randomly selected from a larger set of available data samples.One limitation of the proposed framework is that it can only be used to derive generalization guarantees with a so-called slow rate with respect to the size of the training set. In light of this, we derive another exponential inequality for the random-subset setting which allows for the derivation of generalization guarantees with fast rates with respect to the size of the training set. We show how to evaluate the generalization guarantees obtained through this inequality, as well as their slow-rate counterparts, for overparameterized neural networks trained on MNIST and Fashion-MNIST. Numerical results illustrate that, for some settings, these bounds predict the true generalization capability fairly well, essentially matching the best available bounds in the literature

    Model Comparisons Using Information Measures

    Get PDF
    Methodologists have criticized the use of significance tests in the behavioral sciences but have failed to provide alternative data analysis strategies that appeal to applied researchers. For purposes of comparing alternate models for data, information-theoretic measures such as Akaike AIC have advantages in comparison with significance tests. Model-selection procedures based on a min(AIC) strategy, for example, are holistic rather than dependent upon a series of sometimes contradictory binary (accept/reject) decisions

    D3^3PO - Denoising, Deconvolving, and Decomposing Photon Observations

    Full text link
    The analysis of astronomical images is a non-trivial task. The D3PO algorithm addresses the inference problem of denoising, deconvolving, and decomposing photon observations. Its primary goal is the simultaneous but individual reconstruction of the diffuse and point-like photon flux given a single photon count image, where the fluxes are superimposed. In order to discriminate between these morphologically different signal components, a probabilistic algorithm is derived in the language of information field theory based on a hierarchical Bayesian parameter model. The signal inference exploits prior information on the spatial correlation structure of the diffuse component and the brightness distribution of the spatially uncorrelated point-like sources. A maximum a posteriori solution and a solution minimizing the Gibbs free energy of the inference problem using variational Bayesian methods are discussed. Since the derivation of the solution is not dependent on the underlying position space, the implementation of the D3PO algorithm uses the NIFTY package to ensure applicability to various spatial grids and at any resolution. The fidelity of the algorithm is validated by the analysis of simulated data, including a realistic high energy photon count image showing a 32 x 32 arcmin^2 observation with a spatial resolution of 0.1 arcmin. In all tests the D3PO algorithm successfully denoised, deconvolved, and decomposed the data into a diffuse and a point-like signal estimate for the respective photon flux components.Comment: 22 pages, 8 figures, 2 tables, accepted by Astronomy & Astrophysics; refereed version, 1 figure added, results unchanged, software available at http://www.mpa-garching.mpg.de/ift/d3po
    corecore