Search CORE

1,804 research outputs found

Universal Reinforcement Learning Algorithms: Survey and Experiments

Author: Aslanides John
Hutter Marcus
Leike Jan
Publication venue
Publication date: 30/05/2017
Field of study

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17

arXiv.org e-Print Archive

Crossref

Expected loss analysis of thresholded authentication protocols in noisy conditions

Author: Dimitrakakis Christos
Mitrokotsa Aikaterini
Vaudenay Serge
Publication venue
Publication date: 01/09/2010
Field of study

A number of authentication protocols have been proposed recently, where at least some part of the authentication is performed during a phase, lasting

n

rounds, with no error correction. This requires assigning an acceptable threshold for the number of detected errors. This paper describes a framework enabling an expected loss analysis for all the protocols in this family. Furthermore, computationally simple methods to obtain nearly optimal value of the threshold, as well as for the number of rounds is suggested. Finally, a method to adaptively select both the number of rounds and the threshold is proposed.Comment: 17 pages, 2 figures; draf

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Recommended from our members

Modelling the fair value of annuities contracts: the impact of interest rate risk and mortality risk

Author: Ballotta L.
Esposito G.
Haberman S.
Publication venue: Faculty of Actuarial Science & Insurance, City University London
Publication date: 01/01/2006
Field of study

The purpose of this paper is to analyze the problem of the fair valuation of annuities contracts. The market consistent valuation of these products requires a pricing framework which includes the two main sources of risk affecting the value of the annuity, i.e. interest rate risk and mortality risk. As the IASB has not set any specific guidelines as to which models are the most appropriate for these risks, in this note we consider a range of different models calibrated with historical data. We calculate the fair value of the annuity as a portfolio of zero coupon bonds, each with maturity set equal to the date of the annuity payments; the weights in the portfolio are given by the survival probabilities. Moreover, we focus on the additional information provided by stochastic simulations in order to define a suitable risk margin. The nature of the risk margin is one of the main key issues concerning the IASB and Solvency project

City Research Online

Metamodel-based importance sampling for structural reliability analysis

Author: Deheeger F.
Dubourg V.
Sudret B.
Publication venue
Publication date: 03/05/2011
Field of study

Structural reliability methods aim at computing the probability of failure of systems with respect to some prescribed performance functions. In modern engineering such functions usually resort to running an expensive-to-evaluate computational model (e.g. a finite element model). In this respect simulation methods, which may require

10^{3-6}

runs cannot be used directly. Surrogate models such as quadratic response surfaces, polynomial chaos expansions or kriging (which are built from a limited number of runs of the original model) are then introduced as a substitute of the original model to cope with the computational cost. In practice it is almost impossible to quantify the error made by this substitution though. In this paper we propose to use a kriging surrogate of the performance function as a means to build a quasi-optimal importance sampling density. The probability of failure is eventually obtained as the product of an augmented probability computed by substituting the meta-model for the original performance function and a correction term which ensures that there is no bias in the estimation even if the meta-model is not fully accurate. The approach is applied to analytical and finite element reliability problems and proves efficient up to 100 random variables.Comment: 20 pages, 7 figures, 2 tables. Preprint submitted to Probabilistic Engineering Mechanic

arXiv.org e-Print Archive

HAL Clermont Université

Learning Output Kernels for Multi-Task Problems

Author: Dinuzzo Francesco
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Simultaneously solving multiple related learning tasks is beneficial under a variety of circumstances, but the prior knowledge necessary to correctly model task relationships is rarely available in practice. In this paper, we develop a novel kernel-based multi-task learning technique that automatically reveals structural inter-task relationships. Building over the framework of output kernel learning (OKL), we introduce a method that jointly learns multiple functions and a low-rank multi-task kernel by solving a non-convex regularization problem. Optimization is carried out via a block coordinate descent strategy, where each subproblem is solved using suitable conjugate gradient (CG) type iterative methods for linear operator equations. The effectiveness of the proposed approach is demonstrated on pharmacological and collaborative filtering data

arXiv.org e-Print Archive

CiteSeerX

Publikationsserver der Universität Tübingen

MPG.PuRe

Expectation consistency for calibration of neural networks

Author: Clarté Lucas
Krzakala Florent
Loureiro Bruno
Zdeborová Lenka
Publication venue
Publication date: 04/08/2023
Field of study

Despite their incredible performance, it is well reported that deep neural networks tend to be overoptimistic about their prediction confidence. Finding effective and efficient calibration methods for neural networks is therefore an important endeavour towards better uncertainty quantification in deep learning. In this manuscript, we introduce a novel calibration technique named expectation consistency (EC), consisting of a post-training rescaling of the last layer weights by enforcing that the average validation confidence coincides with the average proportion of correct labels. First, we show that the EC method achieves similar calibration performance to temperature scaling (TS) across different neural network architectures and data sets, all while requiring similar validation samples and computational resources. However, we argue that EC provides a principled method grounded on a Bayesian optimality principle known as the Nishimori identity. Next, we provide an asymptotic characterization of both TS and EC in a synthetic setting and show that their performance crucially depends on the target function. In particular, we discuss examples where EC significantly outperforms TS

arXiv.org e-Print Archive