9,774 research outputs found
Dynamic Occupancy Grid Prediction for Urban Autonomous Driving: A Deep Learning Approach with Fully Automatic Labeling
Long-term situation prediction plays a crucial role in the development of
intelligent vehicles. A major challenge still to overcome is the prediction of
complex downtown scenarios with multiple road users, e.g., pedestrians, bikes,
and motor vehicles, interacting with each other. This contribution tackles this
challenge by combining a Bayesian filtering technique for environment
representation, and machine learning as long-term predictor. More specifically,
a dynamic occupancy grid map is utilized as input to a deep convolutional
neural network. This yields the advantage of using spatially distributed
velocity estimates from a single time step for prediction, rather than a raw
data sequence, alleviating common problems dealing with input time series of
multiple sensors. Furthermore, convolutional neural networks have the inherent
characteristic of using context information, enabling the implicit modeling of
road user interaction. Pixel-wise balancing is applied in the loss function
counteracting the extreme imbalance between static and dynamic cells. One of
the major advantages is the unsupervised learning character due to fully
automatic label generation. The presented algorithm is trained and evaluated on
multiple hours of recorded sensor data and compared to Monte-Carlo simulation
Relational Reasoning Network (RRN) for Anatomical Landmarking
Accurately identifying anatomical landmarks is a crucial step in deformation
analysis and surgical planning for craniomaxillofacial (CMF) bones. Available
methods require segmentation of the object of interest for precise landmarking.
Unlike those, our purpose in this study is to perform anatomical landmarking
using the inherent relation of CMF bones without explicitly segmenting them. We
propose a new deep network architecture, called relational reasoning network
(RRN), to accurately learn the local and the global relations of the landmarks.
Specifically, we are interested in learning landmarks in CMF region: mandible,
maxilla, and nasal bones. The proposed RRN works in an end-to-end manner,
utilizing learned relations of the landmarks based on dense-block units and
without the need for segmentation. For a given a few landmarks as input, the
proposed system accurately and efficiently localizes the remaining landmarks on
the aforementioned bones. For a comprehensive evaluation of RRN, we used
cone-beam computed tomography (CBCT) scans of 250 patients. The proposed system
identifies the landmark locations very accurately even when there are severe
pathologies or deformations in the bones. The proposed RRN has also revealed
unique relationships among the landmarks that help us infer several reasoning
about informativeness of the landmark points. RRN is invariant to order of
landmarks and it allowed us to discover the optimal configurations (number and
location) for landmarks to be localized within the object of interest
(mandible) or nearby objects (maxilla and nasal). To the best of our knowledge,
this is the first of its kind algorithm finding anatomical relations of the
objects using deep learning.Comment: 10 pages, 6 Figures, 3 Table
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
Context Based Visual Content Verification
In this paper the intermediary visual content verification method based on
multi-level co-occurrences is studied. The co-occurrence statistics are in
general used to determine relational properties between objects based on
information collected from data. As such these measures are heavily subject to
relative number of occurrences and give only limited amount of accuracy when
predicting objects in real world. In order to improve the accuracy of this
method in the verification task, we include the context information such as
location, type of environment etc. In order to train our model we provide new
annotated dataset the Advanced Attribute VOC (AAVOC) that contains additional
properties of the image. We show that the usage of context greatly improve the
accuracy of verification with up to 16% improvement.Comment: 6 pages, 6 Figures, Published in Proceedings of the Information and
Digital Technology Conference, 201
Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking
Current multi-person localisation and tracking systems have an over reliance
on the use of appearance models for target re-identification and almost no
approaches employ a complete deep learning solution for both objectives. We
present a novel, complete deep learning framework for multi-person localisation
and tracking. In this context we first introduce a light weight sequential
Generative Adversarial Network architecture for person localisation, which
overcomes issues related to occlusions and noisy detections, typically found in
a multi person environment. In the proposed tracking framework we build upon
recent advances in pedestrian trajectory prediction approaches and propose a
novel data association scheme based on predicted trajectories. This removes the
need for computationally expensive person re-identification systems based on
appearance features and generates human like trajectories with minimal
fragmentation. The proposed method is evaluated on multiple public benchmarks
including both static and dynamic cameras and is capable of generating
outstanding performance, especially among other recently proposed deep neural
network based approaches.Comment: To appear in IEEE Winter Conference on Applications of Computer
Vision (WACV), 201
Vision-based deep execution monitoring
Execution monitor of high-level robot actions can be effectively improved by
visual monitoring the state of the world in terms of preconditions and
postconditions that hold before and after the execution of an action.
Furthermore a policy for searching where to look at, either for verifying the
relations that specify the pre and postconditions or to refocus in case of a
failure, can tremendously improve the robot execution in an uncharted
environment. It is now possible to strongly rely on visual perception in order
to make the assumption that the environment is observable, by the amazing
results of deep learning. In this work we present visual execution monitoring
for a robot executing tasks in an uncharted Lab environment. The execution
monitor interacts with the environment via a visual stream that uses two DCNN
for recognizing the objects the robot has to deal with and manipulate, and a
non-parametric Bayes estimation to discover the relations out of the DCNN
features. To recover from lack of focus and failures due to missed objects we
resort to visual search policies via deep reinforcement learning
- …