15,516 research outputs found

    Nonlinear Information Bottleneck

    Full text link
    Information bottleneck (IB) is a technique for extracting information in one random variable XX that is relevant for predicting another random variable YY. IB works by encoding XX in a compressed "bottleneck" random variable MM from which YY can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete XX and YY with small state spaces, and continuous XX and YY with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous XX and YY, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed "variational IB" method on several real-world datasets

    Trading quantum for classical resources in quantum data compression

    Get PDF
    We study the visible compression of a source E of pure quantum signal states, or, more formally, the minimal resources per signal required to represent arbitrarily long strings of signals with arbitrarily high fidelity, when the compressor is given the identity of the input state sequence as classical information. According to the quantum source coding theorem, the optimal quantum rate is the von Neumann entropy S(E) qubits per signal. We develop a refinement of this theorem in order to analyze the situation in which the states are coded into classical and quantum bits that are quantified separately. This leads to a trade--off curve Q(R), where Q(R) qubits per signal is the optimal quantum rate for a given classical rate of R bits per signal. Our main result is an explicit characterization of this trade--off function by a simple formula in terms of only single signal, perfect fidelity encodings of the source. We give a thorough discussion of many further mathematical properties of our formula, including an analysis of its behavior for group covariant sources and a generalization to sources with continuously parameterized states. We also show that our result leads to a number of corollaries characterizing the trade--off between information gain and state disturbance for quantum sources. In addition, we indicate how our techniques also provide a solution to the so--called remote state preparation problem. Finally, we develop a probability--free version of our main result which may be interpreted as an answer to the question: ``How many classical bits does a qubit cost?'' This theorem provides a type of dual to Holevo's theorem, insofar as the latter characterizes the cost of coding classical bits into qubits.Comment: 51 pages, 7 figure

    Forgetfulness of continuous Markovian quantum channels

    Full text link
    The notion of forgetfulness, used in discrete quantum memory channels, is slightly weakened in order to be applied to the case of continuous channels. This is done in the context of quantum memory channels with Markovian noise. As a case study, we apply the notion of weak-forgetfulness to a bosonic memory channel with additive noise. A suitable encoding and decoding unitary transformation allows us to unravel the effects of the memory, hence the channel capacities can be computed using known results from the memoryless setting.Comment: 6 pages, 2 figures, comments are welcome. Minor corrections and acknoledgment adde

    Minimum Rates of Approximate Sufficient Statistics

    Full text link
    Given a sufficient statistic for a parametric family of distributions, one can estimate the parameter without access to the data. However, the memory or code size for storing the sufficient statistic may nonetheless still be prohibitive. Indeed, for nn independent samples drawn from a kk-nomial distribution with d=k1d=k-1 degrees of freedom, the length of the code scales as dlogn+O(1)d\log n+O(1). In many applications, we may not have a useful notion of sufficient statistics (e.g., when the parametric family is not an exponential family) and we also may not need to reconstruct the generating distribution exactly. By adopting a Shannon-theoretic approach in which we allow a small error in estimating the generating distribution, we construct various {\em approximate sufficient statistics} and show that the code length can be reduced to d2logn+O(1)\frac{d}{2}\log n+O(1). We consider errors measured according to the relative entropy and variational distance criteria. For the code constructions, we leverage Rissanen's minimum description length principle, which yields a non-vanishing error measured according to the relative entropy. For the converse parts, we use Clarke and Barron's formula for the relative entropy of a parametrized distribution and the corresponding mixture distribution. However, this method only yields a weak converse for the variational distance. We develop new techniques to achieve vanishing errors and we also prove strong converses. The latter means that even if the code is allowed to have a non-vanishing error, its length must still be at least d2logn\frac{d}{2}\log n.Comment: To appear in the IEEE Transactions on Information Theor

    Learning Generative Models with Sinkhorn Divergences

    Full text link
    The ability to compare two degenerate probability distributions (i.e. two probability distributions supported on two distinct low-dimensional manifolds living in a much higher-dimensional space) is a crucial problem arising in the estimation of generative models for high-dimensional observations such as those arising in computer vision or natural language. It is known that optimal transport metrics can represent a cure for this problem, since they were specifically designed as an alternative to information divergences to handle such problematic scenarios. Unfortunately, training generative machines using OT raises formidable computational and statistical challenges, because of (i) the computational burden of evaluating OT losses, (ii) the instability and lack of smoothness of these losses, (iii) the difficulty to estimate robustly these losses and their gradients in high dimension. This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles these three issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations. These two approximations result in a robust and differentiable approximation of the OT loss with streamlined GPU execution. Entropic smoothing generates a family of losses interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus allowing to find a sweet spot leveraging the geometry of OT and the favorable high-dimensional sample complexity of MMD which comes with unbiased gradient estimates. The resulting computational architecture complements nicely standard deep network generative models by a stack of extra layers implementing the loss function
    corecore