13 research outputs found

    VIDEO FOREGROUND LOCALIZATION FROM TRADITIONAL METHODS TO DEEP LEARNING

    Get PDF
    These days, detection of Visual Attention Regions (VAR), such as moving objects has become an integral part of many Computer Vision applications, viz. pattern recognition, object detection and classification, video surveillance, autonomous driving, human-machine interaction (HMI), and so forth. The moving object identification using bounding boxes has matured to the level of localizing the objects along their rigid borders and the process is called foreground localization (FGL). Over the decades, many image segmentation methodologies have been well studied, devised, and extended to suit the video FGL. Despite that, still, the problem of video foreground (FG) segmentation remains an intriguing task yet appealing due to its ill-posed nature and myriad of applications. Maintaining spatial and temporal coherence, particularly at object boundaries, persists challenging, and computationally burdensome. It even gets harder when the background possesses dynamic nature, like swaying tree branches or shimmering water body, and illumination variations, shadows cast by the moving objects, or when the video sequences have jittery frames caused by vibrating or unstable camera mounts on a surveillance post or moving robot. At the same time, in the analysis of traffic flow or human activity, the performance of an intelligent system substantially depends on its robustness of localizing the VAR, i.e., the FG. To this end, the natural question arises as what is the best way to deal with these challenges? Thus, the goal of this thesis is to investigate plausible real-time performant implementations from traditional approaches to modern-day deep learning (DL) models for FGL that can be applicable to many video content-aware applications (VCAA). It focuses mainly on improving existing methodologies through harnessing multimodal spatial and temporal cues for a delineated FGL. The first part of the dissertation is dedicated for enhancing conventional sample-based and Gaussian mixture model (GMM)-based video FGL using probability mass function (PMF), temporal median filtering, and fusing CIEDE2000 color similarity, color distortion, and illumination measures, and picking an appropriate adaptive threshold to extract the FG pixels. The subjective and objective evaluations are done to show the improvements over a number of similar conventional methods. The second part of the thesis focuses on exploiting and improving deep convolutional neural networks (DCNN) for the problem as mentioned earlier. Consequently, three models akin to encoder-decoder (EnDec) network are implemented with various innovative strategies to improve the quality of the FG segmentation. The strategies are not limited to double encoding - slow decoding feature learning, multi-view receptive field feature fusion, and incorporating spatiotemporal cues through long-shortterm memory (LSTM) units both in the subsampling and upsampling subnetworks. Experimental studies are carried out thoroughly on all conditions from baselines to challenging video sequences to prove the effectiveness of the proposed DCNNs. The analysis demonstrates that the architectural efficiency over other methods while quantitative and qualitative experiments show the competitive performance of the proposed models compared to the state-of-the-art

    Low-light Pedestrian Detection in Visible and Infrared Image Feeds: Issues and Challenges

    Full text link
    Pedestrian detection has become a cornerstone for several high-level tasks, including autonomous driving, intelligent transportation, and traffic surveillance. There are several works focussed on pedestrian detection using visible images, mainly in the daytime. However, this task is very intriguing when the environmental conditions change to poor lighting or nighttime. Recently, new ideas have been spurred to use alternative sources, such as Far InfraRed (FIR) temperature sensor feeds for detecting pedestrians in low-light conditions. This study comprehensively reviews recent developments in low-light pedestrian detection approaches. It systematically categorizes and analyses various algorithms from region-based to non-region-based and graph-based learning methodologies by highlighting their methodologies, implementation issues, and challenges. It also outlines the key benchmark datasets that can be used for research and development of advanced pedestrian detection algorithms, particularly in low-light situation

    Intelligent Real-Time Face-Mask Detection System with Hardware Acceleration for COVID-19 Mitigation

    No full text
    This paper proposes and implements a dedicated hardware accelerated real-time face-mask detection system using deep learning (DL). The proposed face-mask detection model (MaskDetect) was benchmarked on three embedded platforms: Raspberry PI 4B with either Google Coral USB TPU or Intel Neural Compute Stick 2 VPU, and NVIDIA Jetson Nano. The MaskDetect was independently quantised and optimised for each hardware accelerated implementation. An ablation study was carried out on the proposed model and its quantised implementations on the embedded hardware configurations above as a comparison to other popular transfer-learning models, such as VGG16, ResNet-50V2, and InceptionV3, which are compatible with these acceleration hardware platforms. The ablation study revealed that MaskDetect achieved excellent average face-mask detection performance with accuracy above 94% across all embedded platforms except for Coral, which achieved an average accuracy of nearly 90%. With respect to detection performance (accuracy), inference speed (frames per second (FPS)), and product cost, the ablation study revealed that implementation on Jetson Nano is the best choice for real-time face-mask detection. It achieved 94.2% detection accuracy and twice greater FPS when compared to its desktop hardware counterpart

    Accurate Flow Regime Classification and Void Fraction Measurement in Two-Phase Flowmeters Using Frequency-Domain Feature Extraction and Neural Networks

    No full text
    Two-phase flow is very important in many areas of science, engineering, and industry. Two-phase flow comprising gas and liquid phases is a common occurrence in oil and gas related industries. This study considers three flow regimes, including homogeneous, annular, and stratified regimes ranging from 5–90% of void fractions simulated via the Mont Carlo N-Particle (MCNP) Code. In the proposed model, two NaI detectors were used for recording the emitted photons of a cesium 137 source that pass through the pipe. Following that, fast Fourier transform (FFT), which aims to transfer recorded signals to frequency domain, was adopted. By analyzing signals in the frequency domain, it is possible to extract some hidden features that are not visible in the time domain analysis. Four distinctive features of registered signals, including average value, the amplitude of dominant frequency, standard deviation (STD), and skewness were extracted. These features were compared to each other to determine the best feature that can offer the best separation. Furthermore, artificial neural network (ANN) was utilized to increase the efficiency of two-phase flowmeters. Additionally, two multi-layer perceptron (MLP) neural networks were adopted for classifying the considered regimes and estimating the volumetric percentages. Applying the proposed model, the outlined flow regimes were accurately classified, resulting in volumetric percentages with a low root mean square error (RMSE) of 1.1%

    Accurate Flow Regime Classification and Void Fraction Measurement in Two-Phase Flowmeters Using Frequency-Domain Feature Extraction and Neural Networks

    No full text
    Two-phase flow is very important in many areas of science, engineering, and industry. Two-phase flow comprising gas and liquid phases is a common occurrence in oil and gas related industries. This study considers three flow regimes, including homogeneous, annular, and stratified regimes ranging from 5–90% of void fractions simulated via the Mont Carlo N-Particle (MCNP) Code. In the proposed model, two NaI detectors were used for recording the emitted photons of a cesium 137 source that pass through the pipe. Following that, fast Fourier transform (FFT), which aims to transfer recorded signals to frequency domain, was adopted. By analyzing signals in the frequency domain, it is possible to extract some hidden features that are not visible in the time domain analysis. Four distinctive features of registered signals, including average value, the amplitude of dominant frequency, standard deviation (STD), and skewness were extracted. These features were compared to each other to determine the best feature that can offer the best separation. Furthermore, artificial neural network (ANN) was utilized to increase the efficiency of two-phase flowmeters. Additionally, two multi-layer perceptron (MLP) neural networks were adopted for classifying the considered regimes and estimating the volumetric percentages. Applying the proposed model, the outlined flow regimes were accurately classified, resulting in volumetric percentages with a low root mean square error (RMSE) of 1.1%
    corecore