1,804 research outputs found
Universal Reinforcement Learning Algorithms: Survey and Experiments
Many state-of-the-art reinforcement learning (RL) algorithms typically assume
that the environment is an ergodic Markov Decision Process (MDP). In contrast,
the field of universal reinforcement learning (URL) is concerned with
algorithms that make as few assumptions as possible about the environment. The
universal Bayesian agent AIXI and a family of related URL algorithms have been
developed in this setting. While numerous theoretical optimality results have
been proven for these agents, there has been no empirical investigation of
their behavior to date. We present a short and accessible survey of these URL
algorithms under a unified notation and framework, along with results of some
experiments that qualitatively illustrate some properties of the resulting
policies, and their relative performance on partially-observable gridworld
environments. We also present an open-source reference implementation of the
algorithms which we hope will facilitate further understanding of, and
experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on
Artificial Intelligence (IJCAI-17
Expected loss analysis of thresholded authentication protocols in noisy conditions
A number of authentication protocols have been proposed recently, where at
least some part of the authentication is performed during a phase, lasting
rounds, with no error correction. This requires assigning an acceptable
threshold for the number of detected errors. This paper describes a framework
enabling an expected loss analysis for all the protocols in this family.
Furthermore, computationally simple methods to obtain nearly optimal value of
the threshold, as well as for the number of rounds is suggested. Finally, a
method to adaptively select both the number of rounds and the threshold is
proposed.Comment: 17 pages, 2 figures; draf
Recommended from our members
Modelling the fair value of annuities contracts: the impact of interest rate risk and mortality risk
The purpose of this paper is to analyze the problem of the fair valuation of annuities contracts. The market consistent valuation of these products requires a pricing framework which includes the two main sources of risk affecting the value of the annuity, i.e. interest rate risk and mortality risk. As the IASB has not set any specific guidelines as to which models are the most appropriate for these risks, in this note we consider a range of different models calibrated with historical data. We calculate the fair value of the annuity as a portfolio of zero coupon bonds, each with maturity set equal to the date of the annuity payments; the weights in the portfolio are given by the survival probabilities. Moreover, we focus on the additional information provided by stochastic simulations in order to define a suitable risk margin. The nature of the risk margin is one of the main key issues concerning the IASB and Solvency project
Metamodel-based importance sampling for structural reliability analysis
Structural reliability methods aim at computing the probability of failure of
systems with respect to some prescribed performance functions. In modern
engineering such functions usually resort to running an expensive-to-evaluate
computational model (e.g. a finite element model). In this respect simulation
methods, which may require runs cannot be used directly. Surrogate
models such as quadratic response surfaces, polynomial chaos expansions or
kriging (which are built from a limited number of runs of the original model)
are then introduced as a substitute of the original model to cope with the
computational cost. In practice it is almost impossible to quantify the error
made by this substitution though. In this paper we propose to use a kriging
surrogate of the performance function as a means to build a quasi-optimal
importance sampling density. The probability of failure is eventually obtained
as the product of an augmented probability computed by substituting the
meta-model for the original performance function and a correction term which
ensures that there is no bias in the estimation even if the meta-model is not
fully accurate. The approach is applied to analytical and finite element
reliability problems and proves efficient up to 100 random variables.Comment: 20 pages, 7 figures, 2 tables. Preprint submitted to Probabilistic
Engineering Mechanic
Learning Output Kernels for Multi-Task Problems
Simultaneously solving multiple related learning tasks is beneficial under a
variety of circumstances, but the prior knowledge necessary to correctly model
task relationships is rarely available in practice. In this paper, we develop a
novel kernel-based multi-task learning technique that automatically reveals
structural inter-task relationships. Building over the framework of output
kernel learning (OKL), we introduce a method that jointly learns multiple
functions and a low-rank multi-task kernel by solving a non-convex
regularization problem. Optimization is carried out via a block coordinate
descent strategy, where each subproblem is solved using suitable conjugate
gradient (CG) type iterative methods for linear operator equations. The
effectiveness of the proposed approach is demonstrated on pharmacological and
collaborative filtering data
Expectation consistency for calibration of neural networks
Despite their incredible performance, it is well reported that deep neural
networks tend to be overoptimistic about their prediction confidence. Finding
effective and efficient calibration methods for neural networks is therefore an
important endeavour towards better uncertainty quantification in deep learning.
In this manuscript, we introduce a novel calibration technique named
expectation consistency (EC), consisting of a post-training rescaling of the
last layer weights by enforcing that the average validation confidence
coincides with the average proportion of correct labels. First, we show that
the EC method achieves similar calibration performance to temperature scaling
(TS) across different neural network architectures and data sets, all while
requiring similar validation samples and computational resources. However, we
argue that EC provides a principled method grounded on a Bayesian optimality
principle known as the Nishimori identity. Next, we provide an asymptotic
characterization of both TS and EC in a synthetic setting and show that their
performance crucially depends on the target function. In particular, we discuss
examples where EC significantly outperforms TS
- âŠ