11,471 research outputs found
Dynamic Occupancy Grid Prediction for Urban Autonomous Driving: A Deep Learning Approach with Fully Automatic Labeling
Long-term situation prediction plays a crucial role in the development of
intelligent vehicles. A major challenge still to overcome is the prediction of
complex downtown scenarios with multiple road users, e.g., pedestrians, bikes,
and motor vehicles, interacting with each other. This contribution tackles this
challenge by combining a Bayesian filtering technique for environment
representation, and machine learning as long-term predictor. More specifically,
a dynamic occupancy grid map is utilized as input to a deep convolutional
neural network. This yields the advantage of using spatially distributed
velocity estimates from a single time step for prediction, rather than a raw
data sequence, alleviating common problems dealing with input time series of
multiple sensors. Furthermore, convolutional neural networks have the inherent
characteristic of using context information, enabling the implicit modeling of
road user interaction. Pixel-wise balancing is applied in the loss function
counteracting the extreme imbalance between static and dynamic cells. One of
the major advantages is the unsupervised learning character due to fully
automatic label generation. The presented algorithm is trained and evaluated on
multiple hours of recorded sensor data and compared to Monte-Carlo simulation
Listening for Sirens: Locating and Classifying Acoustic Alarms in City Scenes
This paper is about alerting acoustic event detection and sound source
localisation in an urban scenario. Specifically, we are interested in spotting
the presence of horns, and sirens of emergency vehicles. In order to obtain a
reliable system able to operate robustly despite the presence of traffic noise,
which can be copious, unstructured and unpredictable, we propose to treat the
spectrograms of incoming stereo signals as images, and apply semantic
segmentation, based on a Unet architecture, to extract the target sound from
the background noise. In a multi-task learning scheme, together with signal
denoising, we perform acoustic event classification to identify the nature of
the alerting sound. Lastly, we use the denoised signals to localise the
acoustic source on the horizon plane, by regressing the direction of arrival of
the sound through a CNN architecture. Our experimental evaluation shows an
average classification rate of 94%, and a median absolute error on the
localisation of 7.5{\deg} when operating on audio frames of 0.5s, and of
2.5{\deg} when operating on frames of 2.5s. The system offers excellent
performance in particularly challenging scenarios, where the noise level is
remarkably high.Comment: 6 pages, 9 figure
Network Uncertainty Informed Semantic Feature Selection for Visual SLAM
In order to facilitate long-term localization using a visual simultaneous
localization and mapping (SLAM) algorithm, careful feature selection can help
ensure that reference points persist over long durations and the runtime and
storage complexity of the algorithm remain consistent. We present SIVO
(Semantically Informed Visual Odometry and Mapping), a novel
information-theoretic feature selection method for visual SLAM which
incorporates semantic segmentation and neural network uncertainty into the
feature selection pipeline. Our algorithm selects points which provide the
highest reduction in Shannon entropy between the entropy of the current state
and the joint entropy of the state, given the addition of the new feature with
the classification entropy of the feature from a Bayesian neural network. Each
selected feature significantly reduces the uncertainty of the vehicle state and
has been detected to be a static object (building, traffic sign, etc.)
repeatedly with a high confidence. This selection strategy generates a sparse
map which can facilitate long-term localization. The KITTI odometry dataset is
used to evaluate our method, and we also compare our results against ORB_SLAM2.
Overall, SIVO performs comparably to the baseline method while reducing the map
size by almost 70%.Comment: Published in: 2019 16th Conference on Computer and Robot Vision (CRV
- …