2,743 research outputs found

    Finding the direction of disturbance propagation in a chemical process using transfer entropy

    No full text
    Published versio

    Self-Distillation for Gaussian Process Regression and Classification

    Full text link
    We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC); data-centric and distribution-centric. The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher, while the distribution-centric approach, re-uses the full probabilistic posterior for the next iteration. By analyzing the properties of these approaches, we show that the data-centric approach for GPR closely relates to known results for self-distillation of kernel ridge regression and that the distribution-centric approach for GPR corresponds to ordinary GPR with a very particular choice of hyperparameters. Furthermore, we demonstrate that the distribution-centric approach for GPC approximately corresponds to data duplication and a particular scaling of the covariance and that the data-centric approach for GPC requires redefining the model from a Binomial likelihood to a continuous Bernoulli likelihood to be well-specified. To the best of our knowledge, our proposed approaches are the first to formulate knowledge distillation specifically for Gaussian Process models.Comment: 10 pages; code at https://github.com/Kennethborup/gaussian_process_self_distillatio

    Knowledge Distillation based Degradation Estimation for Blind Super-Resolution

    Full text link
    Blind image super-resolution (Blind-SR) aims to recover a high-resolution (HR) image from its corresponding low-resolution (LR) input image with unknown degradations. Most of the existing works design an explicit degradation estimator for each degradation to guide SR. However, it is infeasible to provide concrete labels of multiple degradation combinations (e.g., blur, noise, jpeg compression) to supervise the degradation estimator training. In addition, these special designs for certain degradation, such as blur, impedes the models from being generalized to handle different degradations. To this end, it is necessary to design an implicit degradation estimator that can extract discriminative degradation representation for all degradations without relying on the supervision of degradation ground-truth. In this paper, we propose a Knowledge Distillation based Blind-SR network (KDSR). It consists of a knowledge distillation based implicit degradation estimator network (KD-IDE) and an efficient SR network. To learn the KDSR model, we first train a teacher network: KD-IDET_{T}. It takes paired HR and LR patches as inputs and is optimized with the SR network jointly. Then, we further train a student network KD-IDES_{S}, which only takes LR images as input and learns to extract the same implicit degradation representation (IDR) as KD-IDET_{T}. In addition, to fully use extracted IDR, we design a simple, strong, and efficient IDR based dynamic convolution residual block (IDR-DCRB) to build an SR network. We conduct extensive experiments under classic and real-world degradation settings. The results show that KDSR achieves SOTA performance and can generalize to various degradation processes. The source codes and pre-trained models will be released.Comment: ICLR2023, code is available at https://github.com/Zj-BinXia/KDS

    Finite-Block-Length Analysis in Classical and Quantum Information Theory

    Full text link
    Coding technology is used in several information processing tasks. In particular, when noise during transmission disturbs communications, coding technology is employed to protect the information. However, there are two types of coding technology: coding in classical information theory and coding in quantum information theory. Although the physical media used to transmit information ultimately obey quantum mechanics, we need to choose the type of coding depending on the kind of information device, classical or quantum, that is being used. In both branches of information theory, there are many elegant theoretical results under the ideal assumption that an infinitely large system is available. In a realistic situation, we need to account for finite size effects. The present paper reviews finite size effects in classical and quantum information theory with respect to various topics, including applied aspects

    On reverse hypercontractivity

    Get PDF
    We study the notion of reverse hypercontractivity. We show that reverse hypercontractive inequalities are implied by standard hypercontractive inequalities as well as by the modified log-Sobolev inequality. Our proof is based on a new comparison lemma for Dirichlet forms and an extension of the Strook-Varapolos inequality. A consequence of our analysis is that {\em all} simple operators L=Id-\E as well as their tensors satisfy uniform reverse hypercontractive inequalities. That is, for all q<p<1q<p<1 and every positive valued function ff for tlog1q1pt \geq \log \frac{1-q}{1-p} we have etLfqfp\| e^{-tL}f\|_{q} \geq \| f\|_{p}. This should be contrasted with the case of hypercontractive inequalities for simple operators where tt is known to depend not only on pp and qq but also on the underlying space. The new reverse hypercontractive inequalities established here imply new mixing and isoperimetric results for short random walks in product spaces, for certain card-shufflings, for Glauber dynamics in high-temperatures spin systems as well as for queueing processes. The inequalities further imply a quantitative Arrow impossibility theorem for general product distributions and inverse polynomial bounds in the number of players for the non-interactive correlation distillation problem with mm-sided dice.Comment: Final revision. Incorporates referee's comments. The proof of appendix B has been corrected. A shorter version of this article will appear in GAF

    Understanding and Comparing Scalable Gaussian Process Regression for Big Data

    Full text link
    As a non-parametric Bayesian model which produces informative predictive distribution, Gaussian process (GP) has been widely used in various fields, like regression, classification and optimization. The cubic complexity of standard GP however leads to poor scalability, which poses challenges in the era of big data. Hence, various scalable GPs have been developed in the literature in order to improve the scalability while retaining desirable prediction accuracy. This paper devotes to investigating the methodological characteristics and performance of representative global and local scalable GPs including sparse approximations and local aggregations from four main perspectives: scalability, capability, controllability and robustness. The numerical experiments on two toy examples and five real-world datasets with up to 250K points offer the following findings. In terms of scalability, most of the scalable GPs own a time complexity that is linear to the training size. In terms of capability, the sparse approximations capture the long-term spatial correlations, the local aggregations capture the local patterns but suffer from over-fitting in some scenarios. In terms of controllability, we could improve the performance of sparse approximations by simply increasing the inducing size. But this is not the case for local aggregations. In terms of robustness, local aggregations are robust to various initializations of hyperparameters due to the local attention mechanism. Finally, we highlight that the proper hybrid of global and local scalable GPs may be a promising way to improve both the model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB

    Epistemic Neural Networks

    Full text link
    Intelligence relies on an agent's knowledge of what it does not know. This capability can be assessed based on the quality of joint predictions of labels across multiple inputs. Conventional neural networks lack this capability and, since most research has focused on marginal predictions, this shortcoming has been largely overlooked. We introduce the epistemic neural network (ENN) as an interface for models that represent uncertainty as required to generate useful joint predictions. While prior approaches to uncertainty modeling such as Bayesian neural networks can be expressed as ENNs, this new interface facilitates comparison of joint predictions and the design of novel architectures and algorithms. In particular, we introduce the epinet: an architecture that can supplement any conventional neural network, including large pretrained models, and can be trained with modest incremental computation to estimate uncertainty. With an epinet, conventional neural networks outperform very large ensembles, consisting of hundreds or more particles, with orders of magnitude less computation. We demonstrate this efficacy across synthetic data, ImageNet, and some reinforcement learning tasks. As part of this effort we open-source experiment code
    corecore