683 research outputs found
DRLViz: Understanding Decisions and Memory in Deep Reinforcement Learning
We present DRLViz, a visual analytics interface to interpret the internal
memory of an agent (e.g. a robot) trained using deep reinforcement learning.
This memory is composed of large temporal vectors updated when the agent moves
in an environment and is not trivial to understand due to the number of
dimensions, dependencies to past vectors, spatial/temporal correlations, and
co-correlation between dimensions. It is often referred to as a black box as
only inputs (images) and outputs (actions) are intelligible for humans. Using
DRLViz, experts are assisted to interpret decisions using memory reduction
interactions, and to investigate the role of parts of the memory when errors
have been made (e.g. wrong direction). We report on DRLViz applied in the
context of video games simulators (ViZDoom) for a navigation scenario with item
gathering tasks. We also report on experts evaluation using DRLViz, and
applicability of DRLViz to other scenarios and navigation problems beyond
simulation games, as well as its contribution to black box models
interpretability and explainability in the field of visual analytics
EgoMap: Projective mapping and structured egocentric memory for Deep RL
Tasks involving localization, memorization and planning in partially
observable 3D environments are an ongoing challenge in Deep Reinforcement
Learning. We present EgoMap, a spatially structured neural memory architecture.
EgoMap augments a deep reinforcement learning agent's performance in 3D
environments on challenging tasks with multi-step objectives. The EgoMap
architecture incorporates several inductive biases including a differentiable
inverse projection of CNN feature vectors onto a top-down spatially structured
map. The map is updated with ego-motion measurements through a differentiable
affine transform. We show this architecture outperforms both standard recurrent
agents and state of the art agents with structured memory. We demonstrate that
incorporating these inductive biases into an agent's architecture allows for
stable training with reward alone, circumventing the expense of acquiring and
labelling expert trajectories. A detailed ablation study demonstrates the
impact of key aspects of the architecture and through extensive qualitative
analysis, we show how the agent exploits its structured internal memory to
achieve higher performance
State of the Art on Diffusion Models for Visual Computing
The field of visual computing is rapidly advancing due to the emergence of
generative artificial intelligence (AI), which unlocks unprecedented
capabilities for the generation, editing, and reconstruction of images, videos,
and 3D scenes. In these domains, diffusion models are the generative AI
architecture of choice. Within the last year alone, the literature on
diffusion-based tools and applications has seen exponential growth and relevant
papers are published across the computer graphics, computer vision, and AI
communities with new works appearing daily on arXiv. This rapid growth of the
field makes it difficult to keep up with all recent developments. The goal of
this state-of-the-art report (STAR) is to introduce the basic mathematical
concepts of diffusion models, implementation details and design choices of the
popular Stable Diffusion model, as well as overview important aspects of these
generative AI tools, including personalization, conditioning, inversion, among
others. Moreover, we give a comprehensive overview of the rapidly growing
literature on diffusion-based generation and editing, categorized by the type
of generated medium, including 2D images, videos, 3D objects, locomotion, and
4D scenes. Finally, we discuss available datasets, metrics, open challenges,
and social implications. This STAR provides an intuitive starting point to
explore this exciting topic for researchers, artists, and practitioners alike
- …