587 research outputs found
The immune system and other cognitive systems
In the following pages we propose a theory on cognitive systems and the common strategies of perception, which are at the basis of their function. We demonstrate that these strategies are easily seen to be in place in known cognitive systems such as vision and language. Furthermore we show that taking these strategies into consideration implies a new outlook on immune function calling for a new appraisal of the immune system as a cognitive system
Beyond the One Step Greedy Approach in Reinforcement Learning
The famous Policy Iteration algorithm alternates between policy improvement
and policy evaluation. Implementations of this algorithm with several variants
of the latter evaluation stage, e.g, -step and trace-based returns, have
been analyzed in previous works. However, the case of multiple-step lookahead
policy improvement, despite the recent increase in empirical evidence of its
strength, has to our knowledge not been carefully analyzed yet. In this work,
we introduce the first such analysis. Namely, we formulate variants of
multiple-step policy improvement, derive new algorithms using these definitions
and prove their convergence. Moreover, we show that recent prominent
Reinforcement Learning algorithms are, in fact, instances of our framework. We
thus shed light on their empirical success and give a recipe for deriving new
algorithms for future study.Comment: ICML 201
Reinforcement Learning with Trajectory Feedback
The standard feedback model of reinforcement learning requires revealing the
reward of every visited state-action pair. However, in practice, it is often
the case that such frequent feedback is not available. In this work, we take a
first step towards relaxing this assumption and require a weaker form of
feedback, which we refer to as \emph{trajectory feedback}. Instead of observing
the reward obtained after every action, we assume we only receive a score that
represents the quality of the whole trajectory observed by the agent, namely,
the sum of all rewards obtained over this trajectory. We extend reinforcement
learning algorithms to this setting, based on least-squares estimation of the
unknown reward, for both the known and unknown transition model cases, and
study the performance of these algorithms by analyzing their regret. For cases
where the transition model is unknown, we offer a hybrid optimistic-Thompson
Sampling approach that results in a tractable algorithm.Comment: AAAI202
- …