Search CORE

17,343 research outputs found

DRLViz: Understanding Decisions and Memory in Deep Reinforcement Learning

Author: Jaunet Theo
Vuillemot Romain
Wolf Christian
Publication venue
Publication date: 25/05/2020
Field of study

We present DRLViz, a visual analytics interface to interpret the internal memory of an agent (e.g. a robot) trained using deep reinforcement learning. This memory is composed of large temporal vectors updated when the agent moves in an environment and is not trivial to understand due to the number of dimensions, dependencies to past vectors, spatial/temporal correlations, and co-correlation between dimensions. It is often referred to as a black box as only inputs (images) and outputs (actions) are intelligible for humans. Using DRLViz, experts are assisted to interpret decisions using memory reduction interactions, and to investigate the role of parts of the memory when errors have been made (e.g. wrong direction). We report on DRLViz applied in the context of video games simulators (ViZDoom) for a navigation scenario with item gathering tasks. We also report on experts evaluation using DRLViz, and applicability of DRLViz to other scenarios and navigation problems beyond simulation games, as well as its contribution to black box models interpretability and explainability in the field of visual analytics

arXiv.org e-Print Archive

Recommended from our members

What motivates academic dishonesty in students? A reinforcement sensitivity theory explanation

Author: Biggs J.
Burton J. H.
Chamorro‐Premuzic T.
Craig D.
Gray J. A.
Hayes A.
Olt M. R.
Thibodeau P.
Thompson N.
West S. G.
Publication venue: British Psychological Society
Publication date: 11/02/2019
Field of study

BACKGROUND: Academic dishonesty (AD) is an increasing challenge for universities worldwide. The rise of the Internet has further increased opportunities for students to cheat. AIMS: In this study, we investigate the role of personality traits defined within Reinforcement Sensitivity Theory (RST) as potential determinants of AD. RST defines behaviour as resulting from approach (Reward Interest/reactivity, goal-drive, and Impulsivity) and avoidance (behavioural inhibition and Fight-Flight-Freeze) motivations. We further consider the role of deep, surface, or achieving study motivations in mediating/moderating the relationship between personality and AD. SAMPLE: A sample of UK undergraduates (N = 240). METHOD: All participants completed the RST Personality Questionnaire, a short-form version of the study process questionnaire and a measure of engagement in AD, its perceived prevalence, and seriousness. RESULTS: Results showed that RST traits account for additional variance in AD. Mediation analysis suggested that GDP predicted dishonesty indirectly via a surface study approach while the indirect effect via deep study processes suggested dishonesty was not likely. Likelihood of engagement in AD was positively associated with personality traits reflecting Impulsivity and Fight-Flight-Freeze behaviours. Surface study motivation moderated the Impulsivity effect and achieving motivation the FFFS effect such that cheating was even more likely when high levels of these processes were used. CONCLUSIONS: The findings suggest that motivational personality traits defined within RST can explain variance in the likelihood of engaging in dishonest academic behaviours

City Research Online

Crossref

Plymouth Electronic Archive and Research Library

The Power of Linear Recurrent Neural Networks

Author: Litz Sandra
Michael Olivia
Obst Oliver
Stolzenburg Frieder
Publication venue
Publication date: 10/03/2020
Field of study

Recurrent neural networks are a powerful means to cope with time series. We show how a type of linearly activated recurrent neural networks, which we call predictive neural networks, can approximate any time-dependent function f(t) given by a number of function values. The approximation can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed. Furthermore, the network size can be reduced by taking only most relevant components. Thus, in contrast to others, our approach not only learns network weights but also the network architecture. The networks have interesting properties: They end up in ellipse trajectories in the long run and allow the prediction of further values and compact representations of functions. We demonstrate this by several experiments, among them multiple superimposed oscillators (MSO), robotic soccer, and predicting stock prices. Predictive neural networks outperform the previous state-of-the-art for the MSO task with a minimal number of units.Comment: 22 pages, 14 figures and tables, revised implementatio

arXiv.org e-Print Archive