552 research outputs found

    Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles

    Get PDF
    We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space XX into a continuous-time black-box optimization method on XX, the \emph{information-geometric optimization} (IGO) method. Invariance as a design principle minimizes the number of arbitrary choices. The resulting \emph{IGO flow} conducts the natural gradient ascent of an adaptive, time-dependent, quantile-based transformation of the objective function. It makes no assumptions on the objective function to be optimized. The IGO method produces explicit IGO algorithms through time discretization. It naturally recovers versions of known algorithms and offers a systematic way to derive new ones. The cross-entropy method is recovered in a particular case, and can be extended into a smoothed, parametrization-independent maximum likelihood update (IGO-ML). For Gaussian distributions on Rd\mathbb{R}^d, IGO is related to natural evolution strategies (NES) and recovers a version of the CMA-ES algorithm. For Bernoulli distributions on {0,1}d\{0,1\}^d, we recover the PBIL algorithm. From restricted Boltzmann machines, we obtain a novel algorithm for optimization on {0,1}d\{0,1\}^d. All these algorithms are unified under a single information-geometric optimization framework. Thanks to its intrinsic formulation, the IGO method achieves invariance under reparametrization of the search space XX, under a change of parameters of the probability distributions, and under increasing transformations of the objective function. Theory strongly suggests that IGO algorithms have minimal loss in diversity during optimization, provided the initial diversity is high. First experiments using restricted Boltzmann machines confirm this insight. Thus IGO seems to provide, from information theory, an elegant way to spontaneously explore several valleys of a fitness landscape in a single run.Comment: Final published versio

    Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting

    Full text link
    Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of "stacked generalization," namely training an ML algorithm that takes the inferences from the base learners as input. While stacking has been widely applied in practice, its theoretical properties are poorly understood. In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform "much worse" than the oracle best. Our result strengthens and significantly extends the results in Van der Laan et al. (2007). Inspired by the theoretical analysis, we further propose a particular family of stacked generalizations in the context of probabilistic forecasting, each one with a different sensitivity for how much the ensemble weights are allowed to vary across items, timestamps in the forecast horizon, and quantiles. Experimental results demonstrate the performance gain of the proposed method.Comment: ICML 202

    AnoRand: A Semi Supervised Deep Learning Anomaly Detection Method by Random Labeling

    Full text link
    Anomaly detection or more generally outliers detection is one of the most popular and challenging subject in theoretical and applied machine learning. The main challenge is that in general we have access to very few labeled data or no labels at all. In this paper, we present a new semi-supervised anomaly detection method called \textbf{AnoRand} by combining a deep learning architecture with random synthetic label generation. The proposed architecture has two building blocks: (1) a noise detection (ND) block composed of feed forward ferceptron and (2) an autoencoder (AE) block. The main idea of this new architecture is to learn one class (e.g. the majority class in case of anomaly detection) as well as possible by taking advantage of the ability of auto encoders to represent data in a latent space and the ability of Feed Forward Perceptron (FFP) to learn one class when the data is highly imbalanced. First, we create synthetic anomalies by randomly disturbing (add noise) few samples (e.g. 2\%) from the training set. Second, we use the normal and the synthetic samples as input to our model. We compared the performance of the proposed method to 17 state-of-the-art unsupervised anomaly detection method on synthetic datasets and 57 real-world datasets. Our results show that this new method generally outperforms most of the state-of-the-art methods and has the best performance (AUC ROC and AUC PR) on the vast majority of reference datasets. We also tested our method in a supervised way by using the actual labels to train the model. The results show that it has very good performance compared to most of state-of-the-art supervised algorithms

    Federated Conformal Predictors for Distributed Uncertainty Quantification

    Full text link
    Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning since it can be easily applied as a post-processing step to already trained models. In this paper, we extend conformal prediction to the federated learning setting. The main challenge we face is data heterogeneity across the clients -- this violates the fundamental tenet of \emph{exchangeability} required for conformal prediction. We propose a weaker notion of \emph{partial exchangeability}, better suited to the FL setting, and use it to develop the Federated Conformal Prediction (FCP) framework. We show FCP enjoys rigorous theoretical guarantees and excellent empirical performance on several computer vision and medical imaging datasets. Our results demonstrate a practical approach to incorporating meaningful uncertainty quantification in distributed and heterogeneous environments. We provide code used in our experiments \url{https://github.com/clu5/federated-conformal}.Comment: 23 pages, 18 figures, accepted to International Conference on Machine Learning (ICML) 202

    Interpreting Distributional Reinforcement Learning: A Regularization Perspective

    Full text link
    Distributional reinforcement learning~(RL) is a class of state-of-the-art algorithms that estimate the whole distribution of the total return rather than only its expectation. Despite the remarkable performance of distributional RL, a theoretical understanding of its advantages over expectation-based RL remains elusive. In this paper, we attribute the superiority of distributional RL to its regularization effect in terms of the value distribution information regardless of its expectation. Firstly, by leverage of a variant of the gross error model in robust statistics, we decompose the value distribution into its expectation and the remaining distribution part. As such, the extra benefit of distributional RL compared with expectation-based RL is mainly interpreted as the impact of a \textit{risk-sensitive entropy regularization} within the Neural Fitted Z-Iteration framework. Meanwhile, we establish a bridge between the risk-sensitive entropy regularization of distributional RL and the vanilla entropy in maximum entropy RL, focusing specifically on actor-critic algorithms. It reveals that distributional RL induces a corrected reward function and thus promotes a risk-sensitive exploration against the intrinsic uncertainty of the environment. Finally, extensive experiments corroborate the role of the regularization effect of distributional RL and uncover mutual impacts of different entropy regularization. Our research paves a way towards better interpreting the efficacy of distributional RL algorithms, especially through the lens of regularization
    • …
    corecore