8,617 research outputs found
Predicting Future Instance Segmentation by Forecasting Convolutional Features
Anticipating future events is an important prerequisite towards intelligent
behavior. Video forecasting has been studied as a proxy task towards this goal.
Recent work has shown that to predict semantic segmentation of future frames,
forecasting at the semantic level is more effective than forecasting RGB frames
and then segmenting these. In this paper we consider the more challenging
problem of future instance segmentation, which additionally segments out
individual objects. To deal with a varying number of output labels per image,
we develop a predictive model in the space of fixed-sized convolutional
features of the Mask R-CNN instance segmentation model. We apply the "detection
head'" of Mask R-CNN on the predicted features to produce the instance
segmentation of future frames. Experiments show that this approach
significantly improves over strong baselines based on optical flow and
repurposed instance segmentation architectures
How to Train Your Dragon: Tamed Warping Network for Semantic Video Segmentation
Real-time semantic segmentation on high-resolution videos is challenging due
to the strict requirements of speed. Recent approaches have utilized the
inter-frame continuity to reduce redundant computation by warping the feature
maps across adjacent frames, greatly speeding up the inference phase. However,
their accuracy drops significantly owing to the imprecise motion estimation and
error accumulation. In this paper, we propose to introduce a simple and
effective correction stage right after the warping stage to form a framework
named Tamed Warping Network (TWNet), aiming to improve the accuracy and
robustness of warping-based models. The experimental results on the Cityscapes
dataset show that with the correction, the accuracy (mIoU) significantly
increases from 67.3% to 71.6%, and the speed edges down from 65.5 FPS to 61.8
FPS. For non-rigid categories such as "human" and "object", the improvements of
IoU are even higher than 18 percentage points
Drive Video Analysis for the Detection of Traffic Near-Miss Incidents
Because of their recent introduction, self-driving cars and advanced driver
assistance system (ADAS) equipped vehicles have had little opportunity to
learn, the dangerous traffic (including near-miss incident) scenarios that
provide normal drivers with strong motivation to drive safely. Accordingly, as
a means of providing learning depth, this paper presents a novel traffic
database that contains information on a large number of traffic near-miss
incidents that were obtained by mounting driving recorders in more than 100
taxis over the course of a decade. The study makes the following two main
contributions: (i) In order to assist automated systems in detecting near-miss
incidents based on database instances, we created a large-scale traffic
near-miss incident database (NIDB) that consists of video clip of dangerous
events captured by monocular driving recorders. (ii) To illustrate the
applicability of NIDB traffic near-miss incidents, we provide two primary
database-related improvements: parameter fine-tuning using various near-miss
scenes from NIDB, and foreground/background separation into motion
representation. Then, using our new database in conjunction with a monocular
driving recorder, we developed a near-miss recognition method that provides
automated systems with a performance level that is comparable to a human-level
understanding of near-miss incidents (64.5% vs. 68.4% at near-miss recognition,
61.3% vs. 78.7% at near-miss detection).Comment: Accepted to ICRA 201
Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation
We address the problem of semantic nighttime image segmentation and improve
the state-of-the-art, by adapting daytime models to nighttime without using
nighttime annotations. Moreover, we design a new evaluation framework to
address the substantial uncertainty of semantics in nighttime images. Our
central contributions are: 1) a curriculum framework to gradually adapt
semantic segmentation models from day to night through progressively darker
times of day, exploiting cross-time-of-day correspondences between daytime
images from a reference map and dark images to guide the label inference in the
dark domains; 2) a novel uncertainty-aware annotation and evaluation framework
and metric for semantic segmentation, including image regions beyond human
recognition capability in the evaluation in a principled fashion; 3) the Dark
Zurich dataset, comprising 2416 unlabeled nighttime and 2920 unlabeled twilight
images with correspondences to their daytime counterparts plus a set of 201
nighttime images with fine pixel-level annotations created with our protocol,
which serves as a first benchmark for our novel evaluation. Experiments show
that our map-guided curriculum adaptation significantly outperforms
state-of-the-art methods on nighttime sets both for standard metrics and our
uncertainty-aware metric. Furthermore, our uncertainty-aware evaluation reveals
that selective invalidation of predictions can improve results on data with
ambiguous content such as our benchmark and profit safety-oriented applications
involving invalid inputs.Comment: IEEE T-PAMI 202
- …