17,390 research outputs found
Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene
The goal of this paper is to take a single 2D image of a scene and recover
the 3D structure in terms of a small set of factors: a layout representing the
enclosing surfaces as well as a set of objects represented in terms of shape
and pose. We propose a convolutional neural network-based approach to predict
this representation and benchmark it on a large dataset of indoor scenes. Our
experiments evaluate a number of practical design questions, demonstrate that
we can infer this representation, and quantitatively and qualitatively
demonstrate its merits compared to alternate representations.Comment: Project url with code: https://shubhtuls.github.io/factored3
Fireground location understanding by semantic linking of visual objects and building information models
This paper presents an outline for improved localization and situational awareness in fire emergency situations based on semantic technology and computer vision techniques. The novelty of our methodology lies in the semantic linking of video object recognition results from visual and thermal cameras with Building Information Models (BIM). The current limitations and possibilities of certain building information streams in the context of fire safety or fire incident management are addressed in this paper. Furthermore, our data management tools match higher-level semantic metadata descriptors of BIM and deep-learning based visual object recognition and classification networks. Based on these matches, estimations can be generated of camera, objects and event positions in the BIM model, transforming it from a static source of information into a rich, dynamic data provider. Previous work has already investigated the possibilities to link BIM and low-cost point sensors for fireground understanding, but these approaches did not take into account the benefits of video analysis and recent developments in semantics and feature learning research. Finally, the strengths of the proposed approach compared to the state-of-the-art is its (semi -)automatic workflow, generic and modular setup and multi-modal strategy, which allows to automatically create situational awareness, to improve localization and to facilitate the overall fire understanding
Distral: Robust Multitask Reinforcement Learning
Most deep reinforcement learning algorithms are data inefficient in complex
and rich environments, limiting their applicability to many scenarios. One
direction for improving data efficiency is multitask learning with shared
neural network parameters, where efficiency may be improved through transfer
across related tasks. In practice, however, this is not usually observed,
because gradients from different tasks can interfere negatively, making
learning unstable and sometimes even less data efficient. Another issue is the
different reward schemes between tasks, which can easily lead to one task
dominating the learning of a shared model. We propose a new approach for joint
training of multiple tasks, which we refer to as Distral (Distill & transfer
learning). Instead of sharing parameters between the different workers, we
propose to share a "distilled" policy that captures common behaviour across
tasks. Each worker is trained to solve its own task while constrained to stay
close to the shared policy, while the shared policy is trained by distillation
to be the centroid of all task policies. Both aspects of the learning process
are derived by optimizing a joint objective function. We show that our approach
supports efficient transfer on complex 3D environments, outperforming several
related methods. Moreover, the proposed learning process is more robust and
more stable---attributes that are critical in deep reinforcement learning
- …