45,209 research outputs found
Discounting of reward sequences: a test of competing formal models of hyperbolic discounting
Humans are known to discount future rewards hyperbolically in time. Nevertheless, a formal recursive model of hyperbolic discounting has been elusive until recently, with the introduction of the hyperbolically discounted temporal difference (HDTD) model. Prior to that, models of learning (especially reinforcement learning) have relied on exponential discounting, which generally provides poorer fits to behavioral data. Recently, it has been shown that hyperbolic discounting can also be approximated by a summed distribution of exponentially discounted values, instantiated in the μAgents model. The HDTD model and the μAgents model differ in one key respect, namely how they treat sequences of rewards. The μAgents model is a particular implementation of a Parallel discounting model, which values sequences based on the summed value of the individual rewards whereas the HDTD model contains a non-linear interaction. To discriminate among these models, we observed how subjects discounted a sequence of three rewards, and then we tested how well each candidate model fit the subject data. The results show that the Parallel model generally provides a better fit to the human data
Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
Various optimality properties of universal sequence predictors based on
Bayes-mixtures in general, and Solomonoff's prediction scheme in particular,
will be studied. The probability of observing at time , given past
observations can be computed with the chain rule if the true
generating distribution of the sequences is known. If
is unknown, but known to belong to a countable or continuous class \M
one can base ones prediction on the Bayes-mixture defined as a
-weighted sum or integral of distributions \nu\in\M. The cumulative
expected loss of the Bayes-optimal universal prediction scheme based on
is shown to be close to the loss of the Bayes-optimal, but infeasible
prediction scheme based on . We show that the bounds are tight and that no
other predictor can lead to significantly smaller bounds. Furthermore, for
various performance measures, we show Pareto-optimality of and give an
Occam's razor argument that the choice for the weights
is optimal, where is the length of the shortest program describing
. The results are applied to games of chance, defined as a sequence of
bets, observations, and rewards. The prediction schemes (and bounds) are
compared to the popular predictors based on expert advice. Extensions to
infinite alphabets, partial, delayed and probabilistic prediction,
classification, and more active systems are briefly discussed.Comment: 34 page
Source Coding When the Side Information May Be Delayed
For memoryless sources, delayed side information at the decoder does not
improve the rate-distortion function. However, this is not the case for more
general sources with memory, as demonstrated by a number of works focusing on
the special case of (delayed) feedforward. In this paper, a setting is studied
in which the encoder is potentially uncertain about the delay with which
measurements of the side information are acquired at the decoder. Assuming a
hidden Markov model for the sources, at first, a single-letter characterization
is given for the set-up where the side information delay is arbitrary and known
at the encoder, and the reconstruction at the destination is required to be
(near) lossless. Then, with delay equal to zero or one source symbol, a
single-letter characterization is given of the rate-distortion region for the
case where side information may be delayed or not, unbeknownst to the encoder.
The characterization is further extended to allow for additional information to
be sent when the side information is not delayed. Finally, examples for binary
and Gaussian sources are provided.Comment: revised July 201
- …