2,743 research outputs found
Finding the direction of disturbance propagation in a chemical process using transfer entropy
Published versio
Self-Distillation for Gaussian Process Regression and Classification
We propose two approaches to extend the notion of knowledge distillation to
Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC);
data-centric and distribution-centric. The data-centric approach resembles most
current distillation techniques for machine learning, and refits a model on
deterministic predictions from the teacher, while the distribution-centric
approach, re-uses the full probabilistic posterior for the next iteration. By
analyzing the properties of these approaches, we show that the data-centric
approach for GPR closely relates to known results for self-distillation of
kernel ridge regression and that the distribution-centric approach for GPR
corresponds to ordinary GPR with a very particular choice of hyperparameters.
Furthermore, we demonstrate that the distribution-centric approach for GPC
approximately corresponds to data duplication and a particular scaling of the
covariance and that the data-centric approach for GPC requires redefining the
model from a Binomial likelihood to a continuous Bernoulli likelihood to be
well-specified. To the best of our knowledge, our proposed approaches are the
first to formulate knowledge distillation specifically for Gaussian Process
models.Comment: 10 pages; code at
https://github.com/Kennethborup/gaussian_process_self_distillatio
Knowledge Distillation based Degradation Estimation for Blind Super-Resolution
Blind image super-resolution (Blind-SR) aims to recover a high-resolution
(HR) image from its corresponding low-resolution (LR) input image with unknown
degradations. Most of the existing works design an explicit degradation
estimator for each degradation to guide SR. However, it is infeasible to
provide concrete labels of multiple degradation combinations (e.g., blur,
noise, jpeg compression) to supervise the degradation estimator training. In
addition, these special designs for certain degradation, such as blur, impedes
the models from being generalized to handle different degradations. To this
end, it is necessary to design an implicit degradation estimator that can
extract discriminative degradation representation for all degradations without
relying on the supervision of degradation ground-truth. In this paper, we
propose a Knowledge Distillation based Blind-SR network (KDSR). It consists of
a knowledge distillation based implicit degradation estimator network (KD-IDE)
and an efficient SR network. To learn the KDSR model, we first train a teacher
network: KD-IDE. It takes paired HR and LR patches as inputs and is
optimized with the SR network jointly. Then, we further train a student network
KD-IDE, which only takes LR images as input and learns to extract the
same implicit degradation representation (IDR) as KD-IDE. In addition, to
fully use extracted IDR, we design a simple, strong, and efficient IDR based
dynamic convolution residual block (IDR-DCRB) to build an SR network. We
conduct extensive experiments under classic and real-world degradation
settings. The results show that KDSR achieves SOTA performance and can
generalize to various degradation processes. The source codes and pre-trained
models will be released.Comment: ICLR2023, code is available at https://github.com/Zj-BinXia/KDS
Finite-Block-Length Analysis in Classical and Quantum Information Theory
Coding technology is used in several information processing tasks. In
particular, when noise during transmission disturbs communications, coding
technology is employed to protect the information. However, there are two types
of coding technology: coding in classical information theory and coding in
quantum information theory. Although the physical media used to transmit
information ultimately obey quantum mechanics, we need to choose the type of
coding depending on the kind of information device, classical or quantum, that
is being used. In both branches of information theory, there are many elegant
theoretical results under the ideal assumption that an infinitely large system
is available. In a realistic situation, we need to account for finite size
effects. The present paper reviews finite size effects in classical and quantum
information theory with respect to various topics, including applied aspects
On reverse hypercontractivity
We study the notion of reverse hypercontractivity. We show that reverse
hypercontractive inequalities are implied by standard hypercontractive
inequalities as well as by the modified log-Sobolev inequality. Our proof is
based on a new comparison lemma for Dirichlet forms and an extension of the
Strook-Varapolos inequality.
A consequence of our analysis is that {\em all} simple operators L=Id-\E as
well as their tensors satisfy uniform reverse hypercontractive inequalities.
That is, for all and every positive valued function for we have . This should
be contrasted with the case of hypercontractive inequalities for simple
operators where is known to depend not only on and but also on the
underlying space.
The new reverse hypercontractive inequalities established here imply new
mixing and isoperimetric results for short random walks in product spaces, for
certain card-shufflings, for Glauber dynamics in high-temperatures spin systems
as well as for queueing processes. The inequalities further imply a
quantitative Arrow impossibility theorem for general product distributions and
inverse polynomial bounds in the number of players for the non-interactive
correlation distillation problem with -sided dice.Comment: Final revision. Incorporates referee's comments. The proof of
appendix B has been corrected. A shorter version of this article will appear
in GAF
Understanding and Comparing Scalable Gaussian Process Regression for Big Data
As a non-parametric Bayesian model which produces informative predictive
distribution, Gaussian process (GP) has been widely used in various fields,
like regression, classification and optimization. The cubic complexity of
standard GP however leads to poor scalability, which poses challenges in the
era of big data. Hence, various scalable GPs have been developed in the
literature in order to improve the scalability while retaining desirable
prediction accuracy. This paper devotes to investigating the methodological
characteristics and performance of representative global and local scalable GPs
including sparse approximations and local aggregations from four main
perspectives: scalability, capability, controllability and robustness. The
numerical experiments on two toy examples and five real-world datasets with up
to 250K points offer the following findings. In terms of scalability, most of
the scalable GPs own a time complexity that is linear to the training size. In
terms of capability, the sparse approximations capture the long-term spatial
correlations, the local aggregations capture the local patterns but suffer from
over-fitting in some scenarios. In terms of controllability, we could improve
the performance of sparse approximations by simply increasing the inducing
size. But this is not the case for local aggregations. In terms of robustness,
local aggregations are robust to various initializations of hyperparameters due
to the local attention mechanism. Finally, we highlight that the proper hybrid
of global and local scalable GPs may be a promising way to improve both the
model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB
Epistemic Neural Networks
Intelligence relies on an agent's knowledge of what it does not know. This
capability can be assessed based on the quality of joint predictions of labels
across multiple inputs. Conventional neural networks lack this capability and,
since most research has focused on marginal predictions, this shortcoming has
been largely overlooked. We introduce the epistemic neural network (ENN) as an
interface for models that represent uncertainty as required to generate useful
joint predictions. While prior approaches to uncertainty modeling such as
Bayesian neural networks can be expressed as ENNs, this new interface
facilitates comparison of joint predictions and the design of novel
architectures and algorithms. In particular, we introduce the epinet: an
architecture that can supplement any conventional neural network, including
large pretrained models, and can be trained with modest incremental computation
to estimate uncertainty. With an epinet, conventional neural networks
outperform very large ensembles, consisting of hundreds or more particles, with
orders of magnitude less computation. We demonstrate this efficacy across
synthetic data, ImageNet, and some reinforcement learning tasks. As part of
this effort we open-source experiment code
- …