217 research outputs found

    VIDEO FOREGROUND LOCALIZATION FROM TRADITIONAL METHODS TO DEEP LEARNING

    Get PDF
    These days, detection of Visual Attention Regions (VAR), such as moving objects has become an integral part of many Computer Vision applications, viz. pattern recognition, object detection and classification, video surveillance, autonomous driving, human-machine interaction (HMI), and so forth. The moving object identification using bounding boxes has matured to the level of localizing the objects along their rigid borders and the process is called foreground localization (FGL). Over the decades, many image segmentation methodologies have been well studied, devised, and extended to suit the video FGL. Despite that, still, the problem of video foreground (FG) segmentation remains an intriguing task yet appealing due to its ill-posed nature and myriad of applications. Maintaining spatial and temporal coherence, particularly at object boundaries, persists challenging, and computationally burdensome. It even gets harder when the background possesses dynamic nature, like swaying tree branches or shimmering water body, and illumination variations, shadows cast by the moving objects, or when the video sequences have jittery frames caused by vibrating or unstable camera mounts on a surveillance post or moving robot. At the same time, in the analysis of traffic flow or human activity, the performance of an intelligent system substantially depends on its robustness of localizing the VAR, i.e., the FG. To this end, the natural question arises as what is the best way to deal with these challenges? Thus, the goal of this thesis is to investigate plausible real-time performant implementations from traditional approaches to modern-day deep learning (DL) models for FGL that can be applicable to many video content-aware applications (VCAA). It focuses mainly on improving existing methodologies through harnessing multimodal spatial and temporal cues for a delineated FGL. The first part of the dissertation is dedicated for enhancing conventional sample-based and Gaussian mixture model (GMM)-based video FGL using probability mass function (PMF), temporal median filtering, and fusing CIEDE2000 color similarity, color distortion, and illumination measures, and picking an appropriate adaptive threshold to extract the FG pixels. The subjective and objective evaluations are done to show the improvements over a number of similar conventional methods. The second part of the thesis focuses on exploiting and improving deep convolutional neural networks (DCNN) for the problem as mentioned earlier. Consequently, three models akin to encoder-decoder (EnDec) network are implemented with various innovative strategies to improve the quality of the FG segmentation. The strategies are not limited to double encoding - slow decoding feature learning, multi-view receptive field feature fusion, and incorporating spatiotemporal cues through long-shortterm memory (LSTM) units both in the subsampling and upsampling subnetworks. Experimental studies are carried out thoroughly on all conditions from baselines to challenging video sequences to prove the effectiveness of the proposed DCNNs. The analysis demonstrates that the architectural efficiency over other methods while quantitative and qualitative experiments show the competitive performance of the proposed models compared to the state-of-the-art

    DxNAT - Deep Neural Networks for Explaining Non-Recurring Traffic Congestion

    Full text link
    Non-recurring traffic congestion is caused by temporary disruptions, such as accidents, sports games, adverse weather, etc. We use data related to real-time traffic speed, jam factors (a traffic congestion indicator), and events collected over a year from Nashville, TN to train a multi-layered deep neural network. The traffic dataset contains over 900 million data records. The network is thereafter used to classify the real-time data and identify anomalous operations. Compared with traditional approaches of using statistical or machine learning techniques, our model reaches an accuracy of 98.73 percent when identifying traffic congestion caused by football games. Our approach first encodes the traffic across a region as a scaled image. After that the image data from different timestamps is fused with event- and time-related data. Then a crossover operator is used as a data augmentation method to generate training datasets with more balanced classes. Finally, we use the receiver operating characteristic (ROC) analysis to tune the sensitivity of the classifier. We present the analysis of the training time and the inference time separately

    Carried baggage detection and recognition in video surveillance with foreground segmentation

    Get PDF
    Security cameras installed in public spaces or in private organizations continuously record video data with the aim of detecting and preventing crime. For that reason, video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis, have gained high interest in recent years. In this thesis, the primary focus is on two key aspects of video analysis, reliable moving object segmentation and carried object detection & identification. A novel moving object segmentation scheme by background subtraction is presented in this thesis. The scheme relies on background modelling which is based on multi-directional gradient and phase congruency. As a post processing step, the detected foreground contours are refined by classifying the edge segments as either belonging to the foreground or background. Further contour completion technique by anisotropic diffusion is first introduced in this area. The proposed method targets cast shadow removal, gradual illumination change invariance, and closed contour extraction. A state of the art carried object detection method is employed as a benchmark algorithm. This method includes silhouette analysis by comparing human temporal templates with unencumbered human models. The implementation aspects of the algorithm are improved by automatically estimating the viewing direction of the pedestrian and are extended by a carried luggage identification module. As the temporal template is a frequency template and the information that it provides is not sufficient, a colour temporal template is introduced. The standard steps followed by the state of the art algorithm are approached from a different extended (by colour information) perspective, resulting in more accurate carried object segmentation. The experiments conducted in this research show that the proposed closed foreground segmentation technique attains all the aforementioned goals. The incremental improvements applied to the state of the art carried object detection algorithm revealed the full potential of the scheme. The experiments demonstrate the ability of the proposed carried object detection algorithm to supersede the state of the art method

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
    • …
    corecore