Search CORE

73,198 research outputs found

Video-based situation assessment for road safety

Author: Mohammad Mahmud Abdulla
Publication venue
Publication date
Field of study

In recent decades, situational awareness (SA) has been a major research subject in connection with autonomous vehicles and intelligent transportation systems. Situational awareness concerns the safety of road users, including drivers, passengers, pedestrians and animals. Moreover, it holds key information regarding the nature of upcoming situations. In order to build robust automatic SA systems that sense the environment, a variety of sensors, such as global positioning systems, radars and cameras, have been used. However, due to the high cost, complex installation procedures and high computational load of automatic situational awareness systems, they are unlikely to become standard for vehicles in the near future. In this thesis, a novel video-based framework for the automatic assessment of risk of collision in a road scene is proposed. The framework uses as input the video from a monocular video camera only, avoiding the need for additional, and frequently expensive, sensors. The framework has two main parts: a novel ontology tool for the assessment of risk of collision, and semantic feature extraction based on computervision methods. The ontology tool is designed to represent the various relations between the most important risk factors, such as risk from object and road environmental risk. The semantic features related to these factors iii Abstract iv are based on computer vision methods, such as pedestrian detection and tracking, road-region detection and road-type classification. The quality of these methods is important for achieving accurate results, especially with respect to video segmentation. This thesis, therefore, proposes a new criterion of high-quality video segmentation: the inclusion of temporal-region consistency. On the basis of the new criteria, an online method for the evaluation of video segmentation quality is proposed. This method is more consistent than the state-of-the-art method in terms of perceptual-segmentation quality, for both synthetic and real video datasets. Furthermore, using the Gaussian mixture model for video segmentation, one of the successful video segmentation methods in this area, new online methods for both road-type classification and road-region detection are proposed. The proposed vision-based road-type classification method achieves higher classification accuracy than the state-of-the-art method, for each road type individually. Consequently, it achieves higher overall classi- fication accuracy. Likewise, the proposed vision-based road-region detection method achieves high performance accuracy compared to the state-of-the-art methods, according to two measures: pixel-wise percentage accuracy and area under the receiver operating characteristic (ROC) curve (AUC). Finally, the evaluation performance of the automatic risk-assessment framework is measured. At this stage, the framework includes only the assessment of pedestrian risk in the road scene. Using the semantic information obtained via computer-vision methods, the framework's performance is assessed for two datasets: first, a new dataset proposed in Chapter 7, which comprises six videos, and second, a dataset comAbstract v prising five examples selected from an established, publicly available dataset. Both datasets consist of real-world videos illustrating pedestrian movement. The experimental results show that the proposed framework achieves high accuracy in the assessment of risk resulting from pedestrian behaviour in road scenes

Online Research @ Cardiff

Lucid Data Dreaming for Video Object Segmentation

Author: Benenson Rodrigo
Brox Thomas
Ilg Eddy
Khoreva Anna
Schiele Bernt
Publication venue
Publication date: 01/01/2019
Field of study

Convolutional networks reach top quality in pixel-level video object segmentation but require a large amount of training data (1k~100k) to deliver such results. We propose a new training strategy which achieves state-of-the-art results across three evaluation datasets while using 20x~1000x less annotated data than competing methods. Our approach is suitable for both single and multiple object segmentation. Instead of using large training sets hoping to generalize across domains, we generate in-domain training data using the provided annotation on the first frame of each video to synthesize ("lucid dream") plausible future video frames. In-domain per-video training data allows us to train high quality appearance- and motion-based models, as well as tune the post-processing stage. This approach allows to reach competitive results even when training from only a single annotated frame, without ImageNet pre-training. Our results indicate that using a larger training set is not automatically better, and that for the video object segmentation task a smaller training set that is closer to the target domain is more effective. This changes the mindset regarding how many training samples and general "objectness" knowledge are required for the video object segmentation task.Comment: Accepted in International Journal of Computer Vision (IJCV

arXiv.org e-Print Archive

MPG.PuRe

Click Carving: Segmenting Objects in Video with Point Clicks

Author: Grauman Kristen
Jain Suyog Dutt
Publication venue
Publication date: 05/07/2016
Field of study

We present a novel form of interactive video object segmentation where a few clicks by the user helps the system produce a full spatio-temporal segmentation of the object of interest. Whereas conventional interactive pipelines take the user's initialization as a starting point, we show the value in the system taking the lead even in initialization. In particular, for a given video frame, the system precomputes a ranked list of thousands of possible segmentation hypotheses (also referred to as object region proposals) using image and motion cues. Then, the user looks at the top ranked proposals, and clicks on the object boundary to carve away erroneous ones. This process iterates (typically 2-3 times), and each time the system revises the top ranked proposal set, until the user is satisfied with a resulting segmentation mask. Finally, the mask is propagated across the video to produce a spatio-temporal object tube. On three challenging datasets, we provide extensive comparisons with both existing work and simpler alternative methods. In all, the proposed Click Carving approach strikes an excellent balance of accuracy and human effort. It outperforms all similarly fast methods, and is competitive or better than those requiring 2 to 12 times the effort.Comment: A preliminary version of the material in this document was filed as University of Texas technical report no. UT AI16-0

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications