5,931 research outputs found
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
Multimodal Polynomial Fusion for Detecting Driver Distraction
Distracted driving is deadly, claiming 3,477 lives in the U.S. in 2015 alone.
Although there has been a considerable amount of research on modeling the
distracted behavior of drivers under various conditions, accurate automatic
detection using multiple modalities and especially the contribution of using
the speech modality to improve accuracy has received little attention. This
paper introduces a new multimodal dataset for distracted driving behavior and
discusses automatic distraction detection using features from three modalities:
facial expression, speech and car signals. Detailed multimodal feature analysis
shows that adding more modalities monotonically increases the predictive
accuracy of the model. Finally, a simple and effective multimodal fusion
technique using a polynomial fusion layer shows superior distraction detection
results compared to the baseline SVM and neural network models.Comment: INTERSPEECH 201
Efficient Data Collection in Multimedia Vehicular Sensing Platforms
Vehicles provide an ideal platform for urban sensing applications, as they
can be equipped with all kinds of sensing devices that can continuously monitor
the environment around the travelling vehicle. In this work we are particularly
concerned with the use of vehicles as building blocks of a multimedia mobile
sensor system able to capture camera snapshots of the streets to support
traffic monitoring and urban surveillance tasks. However, cameras are high
data-rate sensors while wireless infrastructures used for vehicular
communications may face performance constraints. Thus, data redundancy
mitigation is of paramount importance in such systems. To address this issue in
this paper we exploit sub-modular optimisation techniques to design efficient
and robust data collection schemes for multimedia vehicular sensor networks. We
also explore an alternative approach for data collection that operates on
longer time scales and relies only on localised decisions rather than
centralised computations. We use network simulations with realistic vehicular
mobility patterns to verify the performance gains of our proposed schemes
compared to a baseline solution that ignores data redundancy. Simulation
results show that our data collection techniques can ensure a more accurate
coverage of the road network while significantly reducing the amount of
transferred data
Fully Convolutional Neural Networks for Dynamic Object Detection in Grid Maps
Grid maps are widely used in robotics to represent obstacles in the
environment and differentiating dynamic objects from static infrastructure is
essential for many practical applications. In this work, we present a methods
that uses a deep convolutional neural network (CNN) to infer whether grid cells
are covering a moving object or not. Compared to tracking approaches, that use
e.g. a particle filter to estimate grid cell velocities and then make a
decision for individual grid cells based on this estimate, our approach uses
the entire grid map as input image for a CNN that inspects a larger area around
each cell and thus takes the structural appearance in the grid map into account
to make a decision. Compared to our reference method, our concept yields a
performance increase from 83.9% to 97.2%. A runtime optimized version of our
approach yields similar improvements with an execution time of just 10
milliseconds.Comment: This is a shorter version of the masters thesis of Florian Piewak and
it was accapted at IV 201
- …