1 research outputs found

    Factorized Decision Forecasting via Combining Value-based and Reward-based Estimation

    No full text
    Abstract — A powerful recent perspective for predicting sequential decisions learns the parameters of decision problems that produce observed behavior as (near) optimal solutions. Under this perspective, behavior is explained in terms of utilities, which can often be defined as functions of state and action features to enable generalization across decision tasks. Two approaches have been proposed from this perspective: estimate a feature-based reward function and recursively compute values from it, or directly estimate a feature-based value function. In this work, we investigate the combination of these two approaches into a single learning task using directed information theory and the principle of maximum entropy. This enables uncovering which type of estimate is most appropriate—in terms of predictive accuracy and/or computational benefit—for different portions of the decision space. I
    corecore