2,557,270 research outputs found
Recommended from our members
Action selection in modular reinforcement learning
textModular reinforcement learning is an approach to resolve the curse of dimensionality problem in traditional reinforcement learning. We design and implement a modular reinforcement learning algorithm, which is based on three major components: Markov decision process decomposition, module training, and global action selection. We define and formalize module class and module instance concepts in decomposition step. Under our framework of decomposition, we train each modules efficiently using SARSA() algorithm. Then we design, implement, test, and compare three action selection algorithms based on different heuristics: Module Combination, Module Selection, and Module Voting. For last two algorithms, we propose a method to calculate module weights efficiently, by using standard deviation of Q-values of each module. We show that Module Combination and Module Voting algorithms produce satisfactory performance in our test domain.Computer Science
Neuronal Activity in the Human Subthalamic Nucleus Encodes Decision Conflict during Action Selection
The subthalamic nucleus (STN), which receives excitatory inputs from the cortex and has direct connections with the inhibitory pathways\ud
of the basal ganglia, is well positioned to efficiently mediate action selection. Here, we use microelectrode recordings captured during\ud
deep brain stimulation surgery as participants engage in a decision task to examine the role of the human STN in action selection. We\ud
demonstrate that spiking activity in the STN increases when participants engage in a decision and that the level of spiking activity\ud
increases with the degree of decision conflict. These data implicate the STN as an important mediator of action selection during decision\ud
processes.\u
Macro action selection with deep reinforcement learning in StarCraft
StarCraft (SC) is one of the most popular and successful Real Time Strategy
(RTS) games. In recent years, SC is also widely accepted as a challenging
testbed for AI research because of its enormous state space, partially observed
information, multi-agent collaboration, and so on. With the help of annual
AIIDE and CIG competitions, a growing number of SC bots are proposed and
continuously improved. However, a large gap remains between the top-level bot
and the professional human player. One vital reason is that current SC bots
mainly rely on predefined rules to select macro actions during their games.
These rules are not scalable and efficient enough to cope with the enormous yet
partially observed state space in the game. In this paper, we propose a deep
reinforcement learning (DRL) framework to improve the selection of macro
actions. Our framework is based on the combination of the Ape-X DQN and the
Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as
LastOrder. Our evaluation, based on training against all bots from the AIIDE
2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning
rate, outperforming 26 bots in total 28 entrants
Kinematic dynamo action in a sphere. II. Symmetry selection
The magnetic fields of the planets are generated by dynamo action in their electrically conducting interiors. The Earth possesses an axial dipole magnetic field but other planets have other configurations: Uranus has an equatorial dipole for example. In a previous paper we explored a two-parameter class of flows, comprising convection rolls, differential rotation (D) and meridional circulation (M), for dynamo generation of steady fields with axial dipole symmetry by solving the kinematic dynamo equations. In this paper we explore generation of the remaining three allowed symmetries: axial quadrupole, equatorial dipole and equatorial quadrupole. The results have implications for the fully nonlinear dynamical dynamo because the flows qualitatively resemble those driven by thermal convection in a rotating sphere, and the symmetries define separable solutions of the nonlinear equations. Axial dipole solutions are generally preferred (they have lower critical magnetic Reynolds number) for D > 0, corresponding to westward surface drift. Axial quadrupoles are preferred for D 0), axial dipoles are preferred. The equatorial dipole must change sign between east and west hemispheres, and is not favoured by any elongation of the flux in longitude (caused by D) or polar concentrations (caused by M): they are preferred for small D and M. Polar and equatorial concentrations can be related to dynamo waves and the sign of Parker's dynamo number. For the three-dimensional flow considered here, the sign of the dynamo number is related to the sense of spiralling of the convection rolls, which must be the same as the surface drif
Policy Learning with Hypothesis based Local Action Selection
For robots to be able to manipulate in unknown and unstructured environments
the robot should be capable of operating under partial observability of the
environment. Object occlusions and unmodeled environments are some of the
factors that result in partial observability. A common scenario where this is
encountered is manipulation in clutter. In the case that the robot needs to
locate an object of interest and manipulate it, it needs to perform a series of
decluttering actions to accurately detect the object of interest. To perform
such a series of actions, the robot also needs to account for the dynamics of
objects in the environment and how they react to contact. This is a non trivial
problem since one needs to reason not only about robot-object interactions but
also object-object interactions in the presence of contact. In the example
scenario of manipulation in clutter, the state vector would have to account for
the pose of the object of interest and the structure of the surrounding
environment. The process model would have to account for all the aforementioned
robot-object, object-object interactions. The complexity of the process model
grows exponentially as the number of objects in the scene increases. This is
commonly the case in unstructured environments. Hence it is not reasonable to
attempt to model all object-object and robot-object interactions explicitly.
Under this setting we propose a hypothesis based action selection algorithm
where we construct a hypothesis set of the possible poses of an object of
interest given the current evidence in the scene and select actions based on
our current set of hypothesis. This hypothesis set tends to represent the
belief about the structure of the environment and the number of poses the
object of interest can take. The agent's only stopping criterion is when the
uncertainty regarding the pose of the object is fully resolved.Comment: RLDM abstrac
An Action Selection Architecture for an Emotional Agent
An architecture for action selection is presented linking emotion, cognition and behavior. It defines the information and emotion processes of an agent. The architecture has been implemented and used in a prototype environment
Constrained action selection in children with developmental coordination disorder
The effect of advance (‘precue’) information on short aiming movements was explored in adults, high school children, and primary school children with and without developmental coordination disorder (n = 10, 14, 16, 10, respectively). Reaction times in the DCD group were longer than in the other groups and were more influenced by the extent to which the precue constrained the possible action space. In contrast, reaction time did not alter as a function of precue condition in adults. Children with DCD showed greater inaccuracy of response (despite the increased RT). We suggest that the different precue effects reflect differences in the relative benefits of priming an action prior to definitive information about the movement goal. The benefits are an interacting function of the task and the skill level of the individual. Our experiment shows that children with DCD gain a benefit from advance preparation in simple aiming movements, highlighting their low skill levels. This result suggests that goal-directed RTs may have diagnostic potential within the clinic
The simulation of action disorganisation in complex activities of daily living
Action selection in everyday goal-directed tasks of moderate complexity is known to be subject to breakdown following extensive frontal brain injury. A model of action selection in such tasks is presented and used to explore three hypotheses concerning the origins of action disorganisation: that it is a consequence of reduced top-down excitation within a hierarchical action schema network coupled with increased bottom-up triggering of schemas from environmental sources, that it is a more general disturbance of schema activation modelled by excessive noise in the schema network, and that it results from a general disturbance of the triggering of schemas by object representations. Results suggest that the action disorganisation syndrome is best accounted for by a general disturbance to schema activation, while altering the balance between top-down and bottom-up activation provides an account of a related disorder - utilisation behaviour. It is further suggested that ideational apraxia (which may result from lesions to left temporoparietal areas and which has similar behavioural consequences to action disorganisation syndrome on tasks of moderate complexity) is a consequence of a generalised disturbance of the triggering of schemas by object representations. Several predictions regarding differences between action disorganisation syndrome and ideational apraxia that follow from this interpretation are detailed
- …
