552 research outputs found
Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
We present a canonical way to turn any smooth parametric family of
probability distributions on an arbitrary search space into a
continuous-time black-box optimization method on , the
\emph{information-geometric optimization} (IGO) method. Invariance as a design
principle minimizes the number of arbitrary choices. The resulting \emph{IGO
flow} conducts the natural gradient ascent of an adaptive, time-dependent,
quantile-based transformation of the objective function. It makes no
assumptions on the objective function to be optimized.
The IGO method produces explicit IGO algorithms through time discretization.
It naturally recovers versions of known algorithms and offers a systematic way
to derive new ones. The cross-entropy method is recovered in a particular case,
and can be extended into a smoothed, parametrization-independent maximum
likelihood update (IGO-ML). For Gaussian distributions on , IGO
is related to natural evolution strategies (NES) and recovers a version of the
CMA-ES algorithm. For Bernoulli distributions on , we recover the
PBIL algorithm. From restricted Boltzmann machines, we obtain a novel algorithm
for optimization on . All these algorithms are unified under a
single information-geometric optimization framework.
Thanks to its intrinsic formulation, the IGO method achieves invariance under
reparametrization of the search space , under a change of parameters of the
probability distributions, and under increasing transformations of the
objective function.
Theory strongly suggests that IGO algorithms have minimal loss in diversity
during optimization, provided the initial diversity is high. First experiments
using restricted Boltzmann machines confirm this insight. Thus IGO seems to
provide, from information theory, an elegant way to spontaneously explore
several valleys of a fitness landscape in a single run.Comment: Final published versio
Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting
Ensembling is among the most popular tools in machine learning (ML) due to
its effectiveness in minimizing variance and thus improving generalization.
Most ensembling methods for black-box base learners fall under the umbrella of
"stacked generalization," namely training an ML algorithm that takes the
inferences from the base learners as input. While stacking has been widely
applied in practice, its theoretical properties are poorly understood. In this
paper, we prove a novel result, showing that choosing the best stacked
generalization from a (finite or finite-dimensional) family of stacked
generalizations based on cross-validated performance does not perform "much
worse" than the oracle best. Our result strengthens and significantly extends
the results in Van der Laan et al. (2007). Inspired by the theoretical
analysis, we further propose a particular family of stacked generalizations in
the context of probabilistic forecasting, each one with a different sensitivity
for how much the ensemble weights are allowed to vary across items, timestamps
in the forecast horizon, and quantiles. Experimental results demonstrate the
performance gain of the proposed method.Comment: ICML 202
AnoRand: A Semi Supervised Deep Learning Anomaly Detection Method by Random Labeling
Anomaly detection or more generally outliers detection is one of the most
popular and challenging subject in theoretical and applied machine learning.
The main challenge is that in general we have access to very few labeled data
or no labels at all. In this paper, we present a new semi-supervised anomaly
detection method called \textbf{AnoRand} by combining a deep learning
architecture with random synthetic label generation. The proposed architecture
has two building blocks: (1) a noise detection (ND) block composed of feed
forward ferceptron and (2) an autoencoder (AE) block. The main idea of this new
architecture is to learn one class (e.g. the majority class in case of anomaly
detection) as well as possible by taking advantage of the ability of auto
encoders to represent data in a latent space and the ability of Feed Forward
Perceptron (FFP) to learn one class when the data is highly imbalanced. First,
we create synthetic anomalies by randomly disturbing (add noise) few samples
(e.g. 2\%) from the training set. Second, we use the normal and the synthetic
samples as input to our model. We compared the performance of the proposed
method to 17 state-of-the-art unsupervised anomaly detection method on
synthetic datasets and 57 real-world datasets. Our results show that this new
method generally outperforms most of the state-of-the-art methods and has the
best performance (AUC ROC and AUC PR) on the vast majority of reference
datasets. We also tested our method in a supervised way by using the actual
labels to train the model. The results show that it has very good performance
compared to most of state-of-the-art supervised algorithms
Federated Conformal Predictors for Distributed Uncertainty Quantification
Conformal prediction is emerging as a popular paradigm for providing rigorous
uncertainty quantification in machine learning since it can be easily applied
as a post-processing step to already trained models.
In this paper, we extend conformal prediction to the federated learning
setting.
The main challenge we face is data heterogeneity across the clients -- this
violates the fundamental tenet of \emph{exchangeability} required for conformal
prediction.
We propose a weaker notion of \emph{partial exchangeability}, better suited
to the FL setting, and use it to develop the Federated Conformal Prediction
(FCP) framework.
We show FCP enjoys rigorous theoretical guarantees and excellent empirical
performance on several computer vision and medical imaging datasets.
Our results demonstrate a practical approach to incorporating meaningful
uncertainty quantification in distributed and heterogeneous environments.
We provide code used in our experiments
\url{https://github.com/clu5/federated-conformal}.Comment: 23 pages, 18 figures, accepted to International Conference on Machine
Learning (ICML) 202
Interpreting Distributional Reinforcement Learning: A Regularization Perspective
Distributional reinforcement learning~(RL) is a class of state-of-the-art
algorithms that estimate the whole distribution of the total return rather than
only its expectation. Despite the remarkable performance of distributional RL,
a theoretical understanding of its advantages over expectation-based RL remains
elusive. In this paper, we attribute the superiority of distributional RL to
its regularization effect in terms of the value distribution information
regardless of its expectation. Firstly, by leverage of a variant of the gross
error model in robust statistics, we decompose the value distribution into its
expectation and the remaining distribution part. As such, the extra benefit of
distributional RL compared with expectation-based RL is mainly interpreted as
the impact of a \textit{risk-sensitive entropy regularization} within the
Neural Fitted Z-Iteration framework. Meanwhile, we establish a bridge between
the risk-sensitive entropy regularization of distributional RL and the vanilla
entropy in maximum entropy RL, focusing specifically on actor-critic
algorithms. It reveals that distributional RL induces a corrected reward
function and thus promotes a risk-sensitive exploration against the intrinsic
uncertainty of the environment. Finally, extensive experiments corroborate the
role of the regularization effect of distributional RL and uncover mutual
impacts of different entropy regularization. Our research paves a way towards
better interpreting the efficacy of distributional RL algorithms, especially
through the lens of regularization
- …