587 research outputs found

    The immune system and other cognitive systems

    Get PDF
    In the following pages we propose a theory on cognitive systems and the common strategies of perception, which are at the basis of their function. We demonstrate that these strategies are easily seen to be in place in known cognitive systems such as vision and language. Furthermore we show that taking these strategies into consideration implies a new outlook on immune function calling for a new appraisal of the immune system as a cognitive system

    Beyond the One Step Greedy Approach in Reinforcement Learning

    Get PDF
    The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, nn-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.Comment: ICML 201

    Reinforcement Learning with Trajectory Feedback

    Full text link
    The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair. However, in practice, it is often the case that such frequent feedback is not available. In this work, we take a first step towards relaxing this assumption and require a weaker form of feedback, which we refer to as \emph{trajectory feedback}. Instead of observing the reward obtained after every action, we assume we only receive a score that represents the quality of the whole trajectory observed by the agent, namely, the sum of all rewards obtained over this trajectory. We extend reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown transition model cases, and study the performance of these algorithms by analyzing their regret. For cases where the transition model is unknown, we offer a hybrid optimistic-Thompson Sampling approach that results in a tractable algorithm.Comment: AAAI202
    • …
    corecore