1,166 research outputs found

    VIDEO FOREGROUND LOCALIZATION FROM TRADITIONAL METHODS TO DEEP LEARNING

    Get PDF
    These days, detection of Visual Attention Regions (VAR), such as moving objects has become an integral part of many Computer Vision applications, viz. pattern recognition, object detection and classification, video surveillance, autonomous driving, human-machine interaction (HMI), and so forth. The moving object identification using bounding boxes has matured to the level of localizing the objects along their rigid borders and the process is called foreground localization (FGL). Over the decades, many image segmentation methodologies have been well studied, devised, and extended to suit the video FGL. Despite that, still, the problem of video foreground (FG) segmentation remains an intriguing task yet appealing due to its ill-posed nature and myriad of applications. Maintaining spatial and temporal coherence, particularly at object boundaries, persists challenging, and computationally burdensome. It even gets harder when the background possesses dynamic nature, like swaying tree branches or shimmering water body, and illumination variations, shadows cast by the moving objects, or when the video sequences have jittery frames caused by vibrating or unstable camera mounts on a surveillance post or moving robot. At the same time, in the analysis of traffic flow or human activity, the performance of an intelligent system substantially depends on its robustness of localizing the VAR, i.e., the FG. To this end, the natural question arises as what is the best way to deal with these challenges? Thus, the goal of this thesis is to investigate plausible real-time performant implementations from traditional approaches to modern-day deep learning (DL) models for FGL that can be applicable to many video content-aware applications (VCAA). It focuses mainly on improving existing methodologies through harnessing multimodal spatial and temporal cues for a delineated FGL. The first part of the dissertation is dedicated for enhancing conventional sample-based and Gaussian mixture model (GMM)-based video FGL using probability mass function (PMF), temporal median filtering, and fusing CIEDE2000 color similarity, color distortion, and illumination measures, and picking an appropriate adaptive threshold to extract the FG pixels. The subjective and objective evaluations are done to show the improvements over a number of similar conventional methods. The second part of the thesis focuses on exploiting and improving deep convolutional neural networks (DCNN) for the problem as mentioned earlier. Consequently, three models akin to encoder-decoder (EnDec) network are implemented with various innovative strategies to improve the quality of the FG segmentation. The strategies are not limited to double encoding - slow decoding feature learning, multi-view receptive field feature fusion, and incorporating spatiotemporal cues through long-shortterm memory (LSTM) units both in the subsampling and upsampling subnetworks. Experimental studies are carried out thoroughly on all conditions from baselines to challenging video sequences to prove the effectiveness of the proposed DCNNs. The analysis demonstrates that the architectural efficiency over other methods while quantitative and qualitative experiments show the competitive performance of the proposed models compared to the state-of-the-art

    Computer Vision Techniques for Background Modeling in Urban Traffic Monitoring

    Get PDF
    Jose Manuel Milla, Sergio Luis Toral, Manuel Vargas and Federico Barrero (2010). Computer Vision Techniques for Background Modeling in Urban Traffic Monitoring, Urban Transport and Hybrid Vehicles, Seref Soylu (Ed.), ISBN: 978-953-307-100-8, InTech, DOI: 10.5772/10179. Available from: http://www.intechopen.com/books/urban-transport-and-hybrid-vehicles/computer-vision-techniques-for-background-modeling-in-urban-traffic-monitoringIn this chapter, several background modelling techniques have been described, analyzed and tested. In particular, different algorithms based on sigma-delta filter have been considered due to their suitability for embedded systems, where computational limitations affect a real-time implementation. A qualitative and a quantitative comparison have been performed among the different algorithms. Obtained results show that the sigma-delta algorithm with confidence measurement exhibits the best performance in terms of adaptation to particular specificities of urban traffic scenes and in terms of computational requirements. A prototype based on an ARM processor has been implemented to test the different versions of the sigma-delta algorithm and to illustrate several applications related to vehicle traffic monitoring and implementation details

    An Experimental Evaluation of Foreground Detection Algorithms in Real Scenes

    Get PDF
    International audience; Foreground detection is an important preliminary step of many video analysis systems. Many algorithms have been proposed in the last years, but there is not yet a consensus on which approach is the most effective, not even limiting the problem to a single category of videos. This paper aims at constituting a first step towards a reliable assessment of the most commonly used approaches. In particular, four notable algorithms that perform foreground detection have been evaluated using quantitative measures to assess their relative merits and demerits. The evaluation has been carried out using a large, publicly available dataset composed by videos representing different realistic applicative scenarios. The obtained performance is presented and discussed, highlighting the conditions under which algorithm can represent the most effective solution

    Pedestrian Detection and Tracking in Video Surveillance System: Issues, Comprehensive Review, and Challenges

    Get PDF
    Pedestrian detection and monitoring in a surveillance system are critical for numerous utility areas which encompass unusual event detection, human gait, congestion or crowded vicinity evaluation, gender classification, fall detection in elderly humans, etc. Researchers’ primary focus is to develop surveillance system that can work in a dynamic environment, but there are major issues and challenges involved in designing such systems. These challenges occur at three different levels of pedestrian detection, viz. video acquisition, human detection, and its tracking. The challenges in acquiring video are, viz. illumination variation, abrupt motion, complex background, shadows, object deformation, etc. Human detection and tracking challenges are varied poses, occlusion, crowd density area tracking, etc. These results in lower recognition rate. A brief summary of surveillance system along with comparisons of pedestrian detection and tracking technique in video surveillance is presented in this chapter. The publicly available pedestrian benchmark databases as well as the future research directions on pedestrian detection have also been discussed

    Segmentation of Moving Objects in Video Sequences with a Dynamic Background

    Get PDF
    Segmentation of objects from a video sequence is one of the basic operations commonly employed in vision-based systems. The quality of the segmented object has a profound effect on the performance of such systems. Segmentation of an object becomes a challenging problem in situations in which the background scenes of a video sequence are not static or contain the cast shadow of the object. This thesis is concerned with developing cost-effective methods for object segmentation from video sequences having dynamic background and cast shadows. A novel technique for the segmentation of foreground from video sequences with a dynamic background is developed. The segmentation problem is treated as a problem of classifying the foreground and background pixels of the frames of a sequence using the pixel color components as multiple features of the images. The individual features representing the pixel gray levels, hue and saturation levels are first extracted and then linearly recombined with suitable weights to form a scalar-valued feature image. Multiple features incorporated into this scalar-valued feature image allows to devise a simple classification scheme in the framework of a support vector machine classifier. Unlike some other data classification approaches for foreground segmentation, in which a priori knowledge of the shape and size of the moving foreground is essential, in the proposed method, training samples are obtained in an automated manner. The proposed technique is shown not to be limited by the number, patterns or dimensions of the objects. The foreground of a video frame is the region of the frame that contains the object as well as its cast shadow. A process of object segmentation generally results in segmenting the entire foreground. Thus, shadow removal from the segmented foreground is essential for object segmentation. A novel computationally efficient shadow removal technique based on multiple features is proposed. Multiple object masks, each based on a single feature, are constructed and merged together to form a single object mask. The main idea of the proposed technique is that an object pixel is less likely to be indistinguishable from the shadow pixels simultaneously with respect to all the features used. Extensive simulations are performed by applying the proposed and some existing techniques to challenging video sequences for object segmentation and shadow removal. The subjective and objective results demonstrate the effectiveness and superiority of the schemes developed in this thesis
    • …
    corecore