Search CORE

2,557,270 research outputs found

Recommended from our members

Action selection in modular reinforcement learning

Author: Zhang Ruohan
Publication venue
Publication date: 16/09/2014
Field of study

textModular reinforcement learning is an approach to resolve the curse of dimensionality problem in traditional reinforcement learning. We design and implement a modular reinforcement learning algorithm, which is based on three major components: Markov decision process decomposition, module training, and global action selection. We define and formalize module class and module instance concepts in decomposition step. Under our framework of decomposition, we train each modules efficiently using SARSA(

\lambda

) algorithm. Then we design, implement, test, and compare three action selection algorithms based on different heuristics: Module Combination, Module Selection, and Module Voting. For last two algorithms, we propose a method to calculate module weights efficiently, by using standard deviation of Q-values of each module. We show that Module Combination and Module Voting algorithms produce satisfactory performance in our test domain.Computer Science

Texas ScholarWorks

Neuronal Activity in the Human Subthalamic Nucleus Encodes Decision Conflict during Action Selection

Author: Baltuch G. H.
Jaggi J. L.
Kahana M. J.
Lega B. C.
Weidemann Christoph T.
Zaghloul K. A.
Publication venue
Publication date: 01/01/2012
Field of study

The subthalamic nucleus (STN), which receives excitatory inputs from the cortex and has direct connections with the inhibitory pathways\ud of the basal ganglia, is well positioned to efficiently mediate action selection. Here, we use microelectrode recordings captured during\ud deep brain stimulation surgery as participants engage in a decision task to examine the role of the human STN in action selection. We\ud demonstrate that spiking activity in the STN increases when participants engage in a decision and that the level of spiking activity\ud increases with the degree of decision conflict. These data implicate the STN as an important mediator of action selection during decision\ud processes.\u

CiteSeerX

Crossref

Cronfa at Swansea University

CogPrints Cognitive Sciences Eprint Archive

Macro action selection with deep reinforcement learning in StarCraft

Author: Hu Renjie
Kuang Hongyu
Liu Yang
Sun Huyang
Xu Sijia
Zhuang Zhi
Publication venue
Publication date: 08/10/2019
Field of study

StarCraft (SC) is one of the most popular and successful Real Time Strategy (RTS) games. In recent years, SC is also widely accepted as a challenging testbed for AI research because of its enormous state space, partially observed information, multi-agent collaboration, and so on. With the help of annual AIIDE and CIG competitions, a growing number of SC bots are proposed and continuously improved. However, a large gap remains between the top-level bot and the professional human player. One vital reason is that current SC bots mainly rely on predefined rules to select macro actions during their games. These rules are not scalable and efficient enough to cope with the enormous yet partially observed state space in the game. In this paper, we propose a deep reinforcement learning (DRL) framework to improve the selection of macro actions. Our framework is based on the combination of the Ape-X DQN and the Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as LastOrder. Our evaluation, based on training against all bots from the AIIDE 2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning rate, outperforming 26 bots in total 28 entrants

arXiv.org e-Print Archive

Crossref

Association for the Advancement of Artificial Intelligence: AAAI Publications

Kinematic dynamo action in a sphere. II. Symmetry selection

Author: Barber C.N.
Gibbons S.
Gubbins D.
Love J.J.
Publication venue: 'The Royal Society'
Publication date: 08/07/2000
Field of study

The magnetic fields of the planets are generated by dynamo action in their electrically conducting interiors. The Earth possesses an axial dipole magnetic field but other planets have other configurations: Uranus has an equatorial dipole for example. In a previous paper we explored a two-parameter class of flows, comprising convection rolls, differential rotation (D) and meridional circulation (M), for dynamo generation of steady fields with axial dipole symmetry by solving the kinematic dynamo equations. In this paper we explore generation of the remaining three allowed symmetries: axial quadrupole, equatorial dipole and equatorial quadrupole. The results have implications for the fully nonlinear dynamical dynamo because the flows qualitatively resemble those driven by thermal convection in a rotating sphere, and the symmetries define separable solutions of the nonlinear equations. Axial dipole solutions are generally preferred (they have lower critical magnetic Reynolds number) for D > 0, corresponding to westward surface drift. Axial quadrupoles are preferred for D 0), axial dipoles are preferred. The equatorial dipole must change sign between east and west hemispheres, and is not favoured by any elongation of the flux in longitude (caused by D) or polar concentrations (caused by M): they are preferred for small D and M. Polar and equatorial concentrations can be related to dynamo waves and the sign of Parker's dynamo number. For the three-dimensional flow considered here, the sign of the dynamo number is related to the sense of spiralling of the convection rolls, which must be the same as the surface drif

Crossref

White Rose Research Online

Policy Learning with Hypothesis based Local Action Selection

Author: Bohg Jeannette
Ratliff Nathan
Sankaran Bharath
Schaal Stefan
Publication venue
Publication date: 08/05/2015
Field of study

For robots to be able to manipulate in unknown and unstructured environments the robot should be capable of operating under partial observability of the environment. Object occlusions and unmodeled environments are some of the factors that result in partial observability. A common scenario where this is encountered is manipulation in clutter. In the case that the robot needs to locate an object of interest and manipulate it, it needs to perform a series of decluttering actions to accurately detect the object of interest. To perform such a series of actions, the robot also needs to account for the dynamics of objects in the environment and how they react to contact. This is a non trivial problem since one needs to reason not only about robot-object interactions but also object-object interactions in the presence of contact. In the example scenario of manipulation in clutter, the state vector would have to account for the pose of the object of interest and the structure of the surrounding environment. The process model would have to account for all the aforementioned robot-object, object-object interactions. The complexity of the process model grows exponentially as the number of objects in the scene increases. This is commonly the case in unstructured environments. Hence it is not reasonable to attempt to model all object-object and robot-object interactions explicitly. Under this setting we propose a hypothesis based action selection algorithm where we construct a hypothesis set of the possible poses of an object of interest given the current evidence in the scene and select actions based on our current set of hypothesis. This hypothesis set tends to represent the belief about the structure of the environment and the number of poses the object of interest can take. The agent's only stopping criterion is when the uncertainty regarding the pose of the object is fully resolved.Comment: RLDM abstrac

arXiv.org e-Print Archive

MPG.PuRe

An Action Selection Architecture for an Emotional Agent

Author: Akker H.J.A. op den
Burghouts G.J.
Heylen D.K.J.
Nijholt A.
Poel M.
Publication venue: AAAI Press
Publication date: 01/01/2003
Field of study

An architecture for action selection is presented linking emotion, cognition and behavior. It defines the information and emotion processes of an agent. The architecture has been implemented and used in a prototype environment

CiteSeerX

University of Twente Research Information

Constrained action selection in children with developmental coordination disorder

Author: Brockman A.
Charles J.
Mon-Williams M.
Pettit L.
Plumb M.S.
Williams J.H.G.
Wilson A.D.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

The effect of advance (‘precue’) information on short aiming movements was explored in adults, high school children, and primary school children with and without developmental coordination disorder (n = 10, 14, 16, 10, respectively). Reaction times in the DCD group were longer than in the other groups and were more influenced by the extent to which the precue constrained the possible action space. In contrast, reaction time did not alter as a function of precue condition in adults. Children with DCD showed greater inaccuracy of response (despite the increased RT). We suggest that the different precue effects reflect differences in the relative benefits of priming an action prior to definitive information about the movement goal. The benefits are an interacting function of the task and the skill level of the individual. Our experiment shows that children with DCD gain a benefit from advance preparation in simple aiming movements, highlighting their low skill levels. This result suggests that goal-directed RTs may have diagnostic potential within the clinic

Crossref

Federation ResearchOnline

White Rose Research Online

The simulation of action disorganisation in complex activities of daily living

Author: Anderson JR
Cooper RP
Cooper RP
Duncan J
Fuster JM
Grafman J
Houghton G
Houghton G
Humphreys GW
Humphreys GW
Liepmann H
Luria AR
McClelland JL
Myrna F. Schwartz
Norman DA
Norman DA
Passingham RE
Peter Yule
Pick A
Poeck K
Reason JT
Reason JT
Richard P. Cooper
Rumiati RI
Schwartz MF
Shallice T
Sirigu A
Tim Shallice
Verfaellie M
Williams RJ
Yule P
Zipf GK
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2005
Field of study

Action selection in everyday goal-directed tasks of moderate complexity is known to be subject to breakdown following extensive frontal brain injury. A model of action selection in such tasks is presented and used to explore three hypotheses concerning the origins of action disorganisation: that it is a consequence of reduced top-down excitation within a hierarchical action schema network coupled with increased bottom-up triggering of schemas from environmental sources, that it is a more general disturbance of schema activation modelled by excessive noise in the schema network, and that it results from a general disturbance of the triggering of schemas by object representations. Results suggest that the action disorganisation syndrome is best accounted for by a general disturbance to schema activation, while altering the balance between top-down and bottom-up activation provides an account of a related disorder - utilisation behaviour. It is further suggested that ideational apraxia (which may result from lesions to left temporoparietal areas and which has similar behavioural consequences to action disorganisation syndrome on tasks of moderate complexity) is a consequence of a generalised disturbance of the triggering of schemas by object representations. Several predictions regarding differences between action disorganisation syndrome and ideational apraxia that follow from this interpretation are detailed

Crossref

Birkbeck Institutional Research Online

Sissa Digital Library