9,774 research outputs found

    Dynamic Occupancy Grid Prediction for Urban Autonomous Driving: A Deep Learning Approach with Fully Automatic Labeling

    Full text link
    Long-term situation prediction plays a crucial role in the development of intelligent vehicles. A major challenge still to overcome is the prediction of complex downtown scenarios with multiple road users, e.g., pedestrians, bikes, and motor vehicles, interacting with each other. This contribution tackles this challenge by combining a Bayesian filtering technique for environment representation, and machine learning as long-term predictor. More specifically, a dynamic occupancy grid map is utilized as input to a deep convolutional neural network. This yields the advantage of using spatially distributed velocity estimates from a single time step for prediction, rather than a raw data sequence, alleviating common problems dealing with input time series of multiple sensors. Furthermore, convolutional neural networks have the inherent characteristic of using context information, enabling the implicit modeling of road user interaction. Pixel-wise balancing is applied in the loss function counteracting the extreme imbalance between static and dynamic cells. One of the major advantages is the unsupervised learning character due to fully automatic label generation. The presented algorithm is trained and evaluated on multiple hours of recorded sensor data and compared to Monte-Carlo simulation

    Relational Reasoning Network (RRN) for Anatomical Landmarking

    Full text link
    Accurately identifying anatomical landmarks is a crucial step in deformation analysis and surgical planning for craniomaxillofacial (CMF) bones. Available methods require segmentation of the object of interest for precise landmarking. Unlike those, our purpose in this study is to perform anatomical landmarking using the inherent relation of CMF bones without explicitly segmenting them. We propose a new deep network architecture, called relational reasoning network (RRN), to accurately learn the local and the global relations of the landmarks. Specifically, we are interested in learning landmarks in CMF region: mandible, maxilla, and nasal bones. The proposed RRN works in an end-to-end manner, utilizing learned relations of the landmarks based on dense-block units and without the need for segmentation. For a given a few landmarks as input, the proposed system accurately and efficiently localizes the remaining landmarks on the aforementioned bones. For a comprehensive evaluation of RRN, we used cone-beam computed tomography (CBCT) scans of 250 patients. The proposed system identifies the landmark locations very accurately even when there are severe pathologies or deformations in the bones. The proposed RRN has also revealed unique relationships among the landmarks that help us infer several reasoning about informativeness of the landmark points. RRN is invariant to order of landmarks and it allowed us to discover the optimal configurations (number and location) for landmarks to be localized within the object of interest (mandible) or nearby objects (maxilla and nasal). To the best of our knowledge, this is the first of its kind algorithm finding anatomical relations of the objects using deep learning.Comment: 10 pages, 6 Figures, 3 Table

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure

    Context Based Visual Content Verification

    Full text link
    In this paper the intermediary visual content verification method based on multi-level co-occurrences is studied. The co-occurrence statistics are in general used to determine relational properties between objects based on information collected from data. As such these measures are heavily subject to relative number of occurrences and give only limited amount of accuracy when predicting objects in real world. In order to improve the accuracy of this method in the verification task, we include the context information such as location, type of environment etc. In order to train our model we provide new annotated dataset the Advanced Attribute VOC (AAVOC) that contains additional properties of the image. We show that the usage of context greatly improve the accuracy of verification with up to 16% improvement.Comment: 6 pages, 6 Figures, Published in Proceedings of the Information and Digital Technology Conference, 201

    Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking

    Full text link
    Current multi-person localisation and tracking systems have an over reliance on the use of appearance models for target re-identification and almost no approaches employ a complete deep learning solution for both objectives. We present a novel, complete deep learning framework for multi-person localisation and tracking. In this context we first introduce a light weight sequential Generative Adversarial Network architecture for person localisation, which overcomes issues related to occlusions and noisy detections, typically found in a multi person environment. In the proposed tracking framework we build upon recent advances in pedestrian trajectory prediction approaches and propose a novel data association scheme based on predicted trajectories. This removes the need for computationally expensive person re-identification systems based on appearance features and generates human like trajectories with minimal fragmentation. The proposed method is evaluated on multiple public benchmarks including both static and dynamic cameras and is capable of generating outstanding performance, especially among other recently proposed deep neural network based approaches.Comment: To appear in IEEE Winter Conference on Applications of Computer Vision (WACV), 201

    Vision-based deep execution monitoring

    Full text link
    Execution monitor of high-level robot actions can be effectively improved by visual monitoring the state of the world in terms of preconditions and postconditions that hold before and after the execution of an action. Furthermore a policy for searching where to look at, either for verifying the relations that specify the pre and postconditions or to refocus in case of a failure, can tremendously improve the robot execution in an uncharted environment. It is now possible to strongly rely on visual perception in order to make the assumption that the environment is observable, by the amazing results of deep learning. In this work we present visual execution monitoring for a robot executing tasks in an uncharted Lab environment. The execution monitor interacts with the environment via a visual stream that uses two DCNN for recognizing the objects the robot has to deal with and manipulate, and a non-parametric Bayes estimation to discover the relations out of the DCNN features. To recover from lack of focus and failures due to missed objects we resort to visual search policies via deep reinforcement learning
    • …
    corecore