Search CORE

552 research outputs found

Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles

Author: Arnold Ludovic
Auger Anne
Hansen Nikolaus
Ollivier Yann
Publication venue
Publication date: 16/08/2016
Field of study

We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space

X

into a continuous-time black-box optimization method on

X

, the \emph{information-geometric optimization} (IGO) method. Invariance as a design principle minimizes the number of arbitrary choices. The resulting \emph{IGO flow} conducts the natural gradient ascent of an adaptive, time-dependent, quantile-based transformation of the objective function. It makes no assumptions on the objective function to be optimized. The IGO method produces explicit IGO algorithms through time discretization. It naturally recovers versions of known algorithms and offers a systematic way to derive new ones. The cross-entropy method is recovered in a particular case, and can be extended into a smoothed, parametrization-independent maximum likelihood update (IGO-ML). For Gaussian distributions on

\mathbb{R}^d

, IGO is related to natural evolution strategies (NES) and recovers a version of the CMA-ES algorithm. For Bernoulli distributions on

\{0,1\}^d

, we recover the PBIL algorithm. From restricted Boltzmann machines, we obtain a novel algorithm for optimization on

\{0,1\}^d

. All these algorithms are unified under a single information-geometric optimization framework. Thanks to its intrinsic formulation, the IGO method achieves invariance under reparametrization of the search space

X

, under a change of parameters of the probability distributions, and under increasing transformations of the objective function. Theory strongly suggests that IGO algorithms have minimal loss in diversity during optimization, provided the initial diversity is high. First experiments using restricted Boltzmann machines confirm this insight. Thus IGO seems to provide, from information theory, an elegant way to spontaneously explore several valleys of a fitness landscape in a single run.Comment: Final published versio

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL-Rennes 1

Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting

Author: Gupta Gaurav
Hasson Hilaf
Maddix Danielle C.
Park Youngsuk
Wang Yuyang
Publication venue
Publication date: 28/08/2023
Field of study

Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of "stacked generalization," namely training an ML algorithm that takes the inferences from the base learners as input. While stacking has been widely applied in practice, its theoretical properties are poorly understood. In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform "much worse" than the oracle best. Our result strengthens and significantly extends the results in Van der Laan et al. (2007). Inspired by the theoretical analysis, we further propose a particular family of stacked generalizations in the context of probabilistic forecasting, each one with a different sensitivity for how much the ensemble weights are allowed to vary across items, timestamps in the forecast horizon, and quantiles. Experimental results demonstrate the performance gain of the proposed method.Comment: ICML 202

arXiv.org e-Print Archive

AnoRand: A Semi Supervised Deep Learning Anomaly Detection Method by Random Labeling

Author: Mayaki Mansour Zoubeirou A
Riveill Michel
Publication venue
Publication date: 28/05/2023
Field of study

Anomaly detection or more generally outliers detection is one of the most popular and challenging subject in theoretical and applied machine learning. The main challenge is that in general we have access to very few labeled data or no labels at all. In this paper, we present a new semi-supervised anomaly detection method called \textbf{AnoRand} by combining a deep learning architecture with random synthetic label generation. The proposed architecture has two building blocks: (1) a noise detection (ND) block composed of feed forward ferceptron and (2) an autoencoder (AE) block. The main idea of this new architecture is to learn one class (e.g. the majority class in case of anomaly detection) as well as possible by taking advantage of the ability of auto encoders to represent data in a latent space and the ability of Feed Forward Perceptron (FFP) to learn one class when the data is highly imbalanced. First, we create synthetic anomalies by randomly disturbing (add noise) few samples (e.g. 2\%) from the training set. Second, we use the normal and the synthetic samples as input to our model. We compared the performance of the proposed method to 17 state-of-the-art unsupervised anomaly detection method on synthetic datasets and 57 real-world datasets. Our results show that this new method generally outperforms most of the state-of-the-art methods and has the best performance (AUC ROC and AUC PR) on the vast majority of reference datasets. We also tested our method in a supervised way by using the actual labels to train the model. The results show that it has very good performance compared to most of state-of-the-art supervised algorithms

arXiv.org e-Print Archive

Federated Conformal Predictors for Distributed Uncertainty Quantification

Author: Jordan Michael I.
Karimireddy Sai Praneeth
Lu Charles
Raskar Ramesh
Yu Yaodong
Publication venue
Publication date: 27/05/2023
Field of study

Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning since it can be easily applied as a post-processing step to already trained models. In this paper, we extend conformal prediction to the federated learning setting. The main challenge we face is data heterogeneity across the clients -- this violates the fundamental tenet of \emph{exchangeability} required for conformal prediction. We propose a weaker notion of \emph{partial exchangeability}, better suited to the FL setting, and use it to develop the Federated Conformal Prediction (FCP) framework. We show FCP enjoys rigorous theoretical guarantees and excellent empirical performance on several computer vision and medical imaging datasets. Our results demonstrate a practical approach to incorporating meaningful uncertainty quantification in distributed and heterogeneous environments. We provide code used in our experiments \url{https://github.com/clu5/federated-conformal}.Comment: 23 pages, 18 figures, accepted to International Conference on Machine Learning (ICML) 202

arXiv.org e-Print Archive

Interpreting Distributional Reinforcement Learning: A Regularization Perspective

Author: Jiang Bei
Kong Linglong
Liu Yi
Shi Enze
Sun Ke
Wang Yafei
Yan Xiaodong
Zhao Yingnan
Publication venue
Publication date: 17/09/2022
Field of study

Distributional reinforcement learning~(RL) is a class of state-of-the-art algorithms that estimate the whole distribution of the total return rather than only its expectation. Despite the remarkable performance of distributional RL, a theoretical understanding of its advantages over expectation-based RL remains elusive. In this paper, we attribute the superiority of distributional RL to its regularization effect in terms of the value distribution information regardless of its expectation. Firstly, by leverage of a variant of the gross error model in robust statistics, we decompose the value distribution into its expectation and the remaining distribution part. As such, the extra benefit of distributional RL compared with expectation-based RL is mainly interpreted as the impact of a \textit{risk-sensitive entropy regularization} within the Neural Fitted Z-Iteration framework. Meanwhile, we establish a bridge between the risk-sensitive entropy regularization of distributional RL and the vanilla entropy in maximum entropy RL, focusing specifically on actor-critic algorithms. It reveals that distributional RL induces a corrected reward function and thus promotes a risk-sensitive exploration against the intrinsic uncertainty of the environment. Finally, extensive experiments corroborate the role of the regularization effect of distributional RL and uncover mutual impacts of different entropy regularization. Our research paves a way towards better interpreting the efficacy of distributional RL algorithms, especially through the lens of regularization

arXiv.org e-Print Archive