73,198 research outputs found
Video-based situation assessment for road safety
In recent decades, situational awareness (SA) has been a major research
subject in connection with autonomous vehicles and intelligent transportation
systems. Situational awareness concerns the safety of road
users, including drivers, passengers, pedestrians and animals. Moreover,
it holds key information regarding the nature of upcoming situations.
In order to build robust automatic SA systems that sense
the environment, a variety of sensors, such as global positioning systems,
radars and cameras, have been used. However, due to the high
cost, complex installation procedures and high computational load of
automatic situational awareness systems, they are unlikely to become
standard for vehicles in the near future.
In this thesis, a novel video-based framework for the automatic assessment
of risk of collision in a road scene is proposed. The framework
uses as input the video from a monocular video camera only, avoiding
the need for additional, and frequently expensive, sensors. The framework
has two main parts: a novel ontology tool for the assessment of
risk of collision, and semantic feature extraction based on computervision
methods.
The ontology tool is designed to represent the various relations between
the most important risk factors, such as risk from object and
road environmental risk. The semantic features related to these factors
iii
Abstract iv
are based on computer vision methods, such as pedestrian detection
and tracking, road-region detection and road-type classification. The
quality of these methods is important for achieving accurate results,
especially with respect to video segmentation. This thesis, therefore,
proposes a new criterion of high-quality video segmentation: the inclusion
of temporal-region consistency. On the basis of the new criteria, an
online method for the evaluation of video segmentation quality is proposed.
This method is more consistent than the state-of-the-art method
in terms of perceptual-segmentation quality, for both synthetic and real
video datasets. Furthermore, using the Gaussian mixture model for
video segmentation, one of the successful video segmentation methods
in this area, new online methods for both road-type classification and
road-region detection are proposed.
The proposed vision-based road-type classification method achieves
higher classification accuracy than the state-of-the-art method, for each
road type individually. Consequently, it achieves higher overall classi-
fication accuracy. Likewise, the proposed vision-based road-region detection
method achieves high performance accuracy compared to the
state-of-the-art methods, according to two measures: pixel-wise percentage
accuracy and area under the receiver operating characteristic
(ROC) curve (AUC).
Finally, the evaluation performance of the automatic risk-assessment
framework is measured. At this stage, the framework includes only the
assessment of pedestrian risk in the road scene. Using the semantic
information obtained via computer-vision methods, the framework's
performance is assessed for two datasets: first, a new dataset proposed
in Chapter 7, which comprises six videos, and second, a dataset comAbstract
v
prising five examples selected from an established, publicly available
dataset. Both datasets consist of real-world videos illustrating pedestrian
movement. The experimental results show that the proposed
framework achieves high accuracy in the assessment of risk resulting
from pedestrian behaviour in road scenes
Lucid Data Dreaming for Video Object Segmentation
Convolutional networks reach top quality in pixel-level video object
segmentation but require a large amount of training data (1k~100k) to deliver
such results. We propose a new training strategy which achieves
state-of-the-art results across three evaluation datasets while using 20x~1000x
less annotated data than competing methods. Our approach is suitable for both
single and multiple object segmentation. Instead of using large training sets
hoping to generalize across domains, we generate in-domain training data using
the provided annotation on the first frame of each video to synthesize ("lucid
dream") plausible future video frames. In-domain per-video training data allows
us to train high quality appearance- and motion-based models, as well as tune
the post-processing stage. This approach allows to reach competitive results
even when training from only a single annotated frame, without ImageNet
pre-training. Our results indicate that using a larger training set is not
automatically better, and that for the video object segmentation task a smaller
training set that is closer to the target domain is more effective. This
changes the mindset regarding how many training samples and general
"objectness" knowledge are required for the video object segmentation task.Comment: Accepted in International Journal of Computer Vision (IJCV
Click Carving: Segmenting Objects in Video with Point Clicks
We present a novel form of interactive video object segmentation where a few
clicks by the user helps the system produce a full spatio-temporal segmentation
of the object of interest. Whereas conventional interactive pipelines take the
user's initialization as a starting point, we show the value in the system
taking the lead even in initialization. In particular, for a given video frame,
the system precomputes a ranked list of thousands of possible segmentation
hypotheses (also referred to as object region proposals) using image and motion
cues. Then, the user looks at the top ranked proposals, and clicks on the
object boundary to carve away erroneous ones. This process iterates (typically
2-3 times), and each time the system revises the top ranked proposal set, until
the user is satisfied with a resulting segmentation mask. Finally, the mask is
propagated across the video to produce a spatio-temporal object tube. On three
challenging datasets, we provide extensive comparisons with both existing work
and simpler alternative methods. In all, the proposed Click Carving approach
strikes an excellent balance of accuracy and human effort. It outperforms all
similarly fast methods, and is competitive or better than those requiring 2 to
12 times the effort.Comment: A preliminary version of the material in this document was filed as
University of Texas technical report no. UT AI16-0
- …