2,275 research outputs found
Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of neoHebbian Three-Factor Learning Rules
Most elementary behaviors such as moving the arm to grasp an object or
walking into the next room to explore a museum evolve on the time scale of
seconds; in contrast, neuronal action potentials occur on the time scale of a
few milliseconds. Learning rules of the brain must therefore bridge the gap
between these two different time scales.
Modern theories of synaptic plasticity have postulated that the co-activation
of pre- and postsynaptic neurons sets a flag at the synapse, called an
eligibility trace, that leads to a weight change only if an additional factor
is present while the flag is set. This third factor, signaling reward,
punishment, surprise, or novelty, could be implemented by the phasic activity
of neuromodulators or specific neuronal inputs signaling special events. While
the theoretical framework has been developed over the last decades,
experimental evidence in support of eligibility traces on the time scale of
seconds has been collected only during the last few years.
Here we review, in the context of three-factor rules of synaptic plasticity,
four key experiments that support the role of synaptic eligibility traces in
combination with a third factor as a biological implementation of neoHebbian
three-factor learning rules
Learning with Options that Terminate Off-Policy
A temporally abstract action, or an option, is specified by a policy and a
termination condition: the policy guides option behavior, and the termination
condition roughly determines its length. Generally, learning with longer
options (like learning with multi-step returns) is known to be more efficient.
However, if the option set for the task is not ideal, and cannot express the
primitive optimal policy exactly, shorter options offer more flexibility and
can yield a better solution. Thus, the termination condition puts learning
efficiency at odds with solution quality. We propose to resolve this dilemma by
decoupling the behavior and target terminations, just like it is done with
policies in off-policy learning. To this end, we give a new algorithm,
Q(\beta), that learns the solution with respect to any termination condition,
regardless of how the options actually terminate. We derive Q(\beta) by casting
learning with options into a common framework with well-studied multi-step
off-policy learning. We validate our algorithm empirically, and show that it
holds up to its motivating claims.Comment: AAAI 201
- …