9 research outputs found
DRLViz: Understanding Decisions and Memory in Deep Reinforcement Learning
We present DRLViz, a visual analytics interface to interpret the internal
memory of an agent (e.g. a robot) trained using deep reinforcement learning.
This memory is composed of large temporal vectors updated when the agent moves
in an environment and is not trivial to understand due to the number of
dimensions, dependencies to past vectors, spatial/temporal correlations, and
co-correlation between dimensions. It is often referred to as a black box as
only inputs (images) and outputs (actions) are intelligible for humans. Using
DRLViz, experts are assisted to interpret decisions using memory reduction
interactions, and to investigate the role of parts of the memory when errors
have been made (e.g. wrong direction). We report on DRLViz applied in the
context of video games simulators (ViZDoom) for a navigation scenario with item
gathering tasks. We also report on experts evaluation using DRLViz, and
applicability of DRLViz to other scenarios and navigation problems beyond
simulation games, as well as its contribution to black box models
interpretability and explainability in the field of visual analytics
How Transferable are Reasoning Patterns in VQA?
Since its inception, Visual Question Answering (VQA) is notoriously known as
a task, where models are prone to exploit biases in datasets to find shortcuts
instead of performing high-level reasoning. Classical methods address this by
removing biases from training data, or adding branches to models to detect and
remove biases. In this paper, we argue that uncertainty in vision is a
dominating factor preventing the successful learning of reasoning in vision and
language problems. We train a visual oracle and in a large scale study provide
experimental evidence that it is much less prone to exploiting spurious dataset
biases compared to standard models. We propose to study the attention
mechanisms at work in the visual oracle and compare them with a SOTA
Transformer-based model. We provide an in-depth analysis and visualizations of
reasoning patterns obtained with an online visualization tool which we make
publicly available (https://reasoningpatterns.github.io). We exploit these
insights by transferring reasoning patterns from the oracle to a SOTA
Transformer-based VQA model taking standard noisy visual inputs via
fine-tuning. In experiments we report higher overall accuracy, as well as
accuracy on infrequent answers for each question type, which provides evidence
for improved generalization and a decrease of the dependency on dataset biases
DRLViz: Understanding Decisions and Memory in Deep Reinforcement Learning
International audienceWe present DRLViz, a visual analytics interface to interpret the internal memory of an agent (e.g. a robot) trained using deep reinforcement learning. This memory is composed of large temporal vectors updated when the agent moves in an environment and is not trivial to understand due to the number of dimensions, dependencies to past vectors, spatial/temporal correlations, and co-correlation between dimensions. It is often referred to as a black box as only inputs (images) and outputs (actions) are intelligible for humans. Using DRLViz, experts are assisted to interpret decisions using memory reduction interactions, and to investigate the role of parts of the memory when errors have been made (e.g. wrong direction). We report on DRLViz applied in the context of video games simulators (ViZDoom) for a navigation scenario with item gathering tasks. We also report on experts evaluation using DRLViz, and applicability of DRLViz to other scenarios and navigation problems beyond simulation games, as well as its contribution to black box models interpretability and explainability in the field of visual analytics
RLMViz: Interpréter la Mémoire du Deep Reinforcement Learning
National audienceWe present RLMViz, a visual analytics interface to interpret the internal memory of an agent (e.g., a robot) trained using deep reinforcement learning. This memory is composed of large temporal vectors updated before each action of the agent moving in an environment. This memory is not trivial to understand, and is referred to as a black box, which only inputs (images) and outputs (actions) are understood, but not its inner workings. Using RLMViz, experts can form hypothesis on this memory and derive rules based on the agent's decisions to interpret them, and gain an understanding towards why errors have been made and improve future training process. We report on the main features of RLMViz which are memory navigation and contextualization techniques using time-lines juxtapositions. We also present our early findings using the VizDoom simulator, a standard benchmark for DRL navigation scenarios
What if we Reduce the Memory of an Artificial Doom Player?
International audienc
SIM2REALVIZ: Visualiser le Sim2Real Gap pour l'Estimation de pose de Robot
The Robotics community has started to heavily rely on increasingly realistic 3D simulators for large-scale training of robots on massive amounts of data. But once robots are deployed in the real-world, the simulation gap, as well as changes in the real-world (e.g. lights, objects displacements) leads to errors. In this paper, we introduce SIM2REALVIZ, a visual analytics tool to assist experts in understanding and reducing this gap for robot ego-pose estimation tasks, i. e. the estimation of a robot’s position using trained models. SIM2REALVIZ displays details of a given model and the performance of its instances in both simulation and real-world. Experts can identify environment differences that impact model predictions at a given location and explore through direct interactions with the model hypothesis to fix it. We detail the design of the tool, and case studies related to the exploit of the regression to the mean bias and how it can be addressed, and how models are perturbed by vanishing landmarks such as bikes.La communauté robotique a commencé à s'appuyer fortement sur des simulateurs 3D de plus en plus réalistes pour l'entraînement à grande échelle des robots sur des quantités massives de données. Mais une fois les robots déployés dans le monde réel, le décalage de la simulation, ainsi que les changements dans le monde réel (par exemple, les lumières, les déplacements d'objets) conduisent à des erreurs. Dans cet article, nous présentons SIM2REALVIZ, un outil d'analyse visuelle pour aider les experts à comprendre et à réduire cet écart pour les tâches d'estimation de l'ego-pose des robots, c'est-à -dire l'estimation de la position d'un robot à l'aide de modèles entraînés. SIM2REALVIZ affiche les détails d'un modèle donné et les performances de ses instances en simulation et dans le monde réel. Les experts peuvent identifier les différences d'environnement qui ont un impact sur les prédictions du modèle à un endroit donné et explorer par des interactions directes avec l'hypothèse du modèle pour la corriger. Nous détaillons la conception de l'outil, ainsi que des études de cas liées à l'exploitation du biais de régression à la moyenne et à la façon dont il peut être traité, et à la façon dont les modèles sont perturbés par des points de repère disparus tels que les vélos.Traduit avec www.DeepL.com/Translator (version gratuite
SwimTrack: Swimmers and Stroke Rate Detection in Elite Race Videos
We present SwimTrack, a series of 5 multimedia tasks related to swimming video analysis from elite competition live recordings. These tasks are related to video, image, and audio analysis which may be achieved independently. But when solved altogether, they form a grand challenge to provide sport federations and coaches with novel methods to asses and enhance swimmers' performance, in particular related to stroke rate and length analysis. We share a unique collection of video footage that contains all swimming race types, recorded from a spectator point of view with variations such as lighting reflections, background clutter, noise from the motion of waves, and different point of views on swimmers. SwimTrack is the first challenge of this kind for a total of 4 swimming elite competitions. We sought to include a larger and even more diverse set of videos as well as additional mini-challenges once more recordings will be available in a next version
VisQA: X-raying Vision and Language Reasoning in Transformers
International audienceVisual Question Answering systems target answering open-ended textual questions given input images. They are a testbed for learning high-level reasoning with a primary use in HCI, for instance assistance for the visually impaired. Recent research has shown that state-of-the-art models tend to produce answers exploiting biases and shortcuts in the training data, and sometimes do not even look at the input image, instead of performing the required reasoning steps. We present VisQA, a visual analytics tool thatexplores this question of reasoning vs. bias exploitation. It exposes the key element of state-of-the-art neural models --- attention maps in transformers. Our working hypothesis is that reasoning steps leading to model predictions are observable from attention distributions, which are particularly useful for visualization. The design process of VisQA was motivated by well-known bias examples from the fields of deep learning and vision-language reasoning and evaluated in two ways. First, as a result of a collaboration of three fields, machine learning, vision and language reasoning, and data analytics, the work lead to a better understanding of bias exploitation of neural models for VQA, which eventually resulted in an impact on its design and training through the proposition of a method for the transfer of reasoning patterns from an oracle model. Second, we also report on the design of VisQA, and a goal-oriented evaluation of VisQA targeting the analysis of a model decision process from multiple experts, providing evidence that it makes the inner workings of models accessible to users