10,541 research outputs found
Geometry-Aware Video Object Detection for Static Cameras
In this paper we propose a geometry-aware model for video object detection.
Specifically, we consider the setting that cameras can be well approximated as
static, e.g. in video surveillance scenarios, and scene pseudo depth maps can
therefore be inferred easily from the object scale on the image plane. We make
the following contributions: First, we extend the recent anchor-free detector
(CornerNet [17]) to video object detections. In order to exploit the
spatial-temporal information while maintaining high efficiency, the proposed
model accepts video clips as input, and only makes predictions for the starting
and the ending frames, i.e. heatmaps of object bounding box corners and the
corresponding embeddings for grouping. Second, to tackle the challenge from
scale variations in object detection, scene geometry information, e.g. derived
depth maps, is explicitly incorporated into deep networks for multi-scale
feature selection and for the network prediction. Third, we validate the
proposed architectures on an autonomous driving dataset generated from the
Carla simulator [5], and on a real dataset for human detection (DukeMTMC
dataset [28]). When comparing with the existing competitive single-stage or
two-stage detectors, the proposed geometry-aware spatio-temporal network
achieves significantly better results.Comment: Accepted at BMVC 2019 as ORA
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems
Scenario-based testing for the safety validation of highly automated vehicles
is a promising approach that is being examined in research and industry. This
approach heavily relies on data from real-world scenarios to derive the
necessary scenario information for testing. Measurement data should be
collected at a reasonable effort, contain naturalistic behavior of road users
and include all data relevant for a description of the identified scenarios in
sufficient quality. However, the current measurement methods fail to meet at
least one of the requirements. Thus, we propose a novel method to measure data
from an aerial perspective for scenario-based validation fulfilling the
mentioned requirements. Furthermore, we provide a large-scale naturalistic
vehicle trajectory dataset from German highways called highD. We evaluate the
data in terms of quantity, variety and contained scenarios. Our dataset
consists of 16.5 hours of measurements from six locations with 110 000
vehicles, a total driven distance of 45 000 km and 5600 recorded complete lane
changes. The highD dataset is available online at: http://www.highD-dataset.comComment: IEEE International Conference on Intelligent Transportation Systems
(ITSC) 201
Robust Dense Mapping for Large-Scale Dynamic Environments
We present a stereo-based dense mapping algorithm for large-scale dynamic
urban environments. In contrast to other existing methods, we simultaneously
reconstruct the static background, the moving objects, and the potentially
moving but currently stationary objects separately, which is desirable for
high-level mobile robotic tasks such as path planning in crowded environments.
We use both instance-aware semantic segmentation and sparse scene flow to
classify objects as either background, moving, or potentially moving, thereby
ensuring that the system is able to model objects with the potential to
transition from static to dynamic, such as parked cars. Given camera poses
estimated from visual odometry, both the background and the (potentially)
moving objects are reconstructed separately by fusing the depth maps computed
from the stereo input. In addition to visual odometry, sparse scene flow is
also used to estimate the 3D motions of the detected moving objects, in order
to reconstruct them accurately. A map pruning technique is further developed to
improve reconstruction accuracy and reduce memory consumption, leading to
increased scalability. We evaluate our system thoroughly on the well-known
KITTI dataset. Our system is capable of running on a PC at approximately 2.5Hz,
with the primary bottleneck being the instance-aware semantic segmentation,
which is a limitation we hope to address in future work. The source code is
available from the project website (http://andreibarsan.github.io/dynslam).Comment: Presented at IEEE International Conference on Robotics and Automation
(ICRA), 201
- …