163 research outputs found

    UHD映像のための前景物体検出の高速化

    Get PDF
    早大学位記番号:新7460早稲田大

    Background Subtraction Methods in Video Streams: A Review

    Get PDF
    Background subtraction is one of the most important parts in image and video processing field. There are some unnecessary parts during the image or video processing, and should be removed, because they lead to more execution time or required memory. Several subtraction methods have been presented for the time being, but find the best-suited method is an issue, which this study is going to address. Furthermore, each process needs to the specific subtraction technique, and knowing this issue helps researchers to achieve faster and higher performance in their research. This paper presents a comparative study of several existing background subtraction methods which have been investigated from simple background subtraction to more complex statistical techniques. The goal of this study is to provide a view of the strengths and drawbacks of the widely used methods. The methods are compared based on their memory requirement, the computational time and their robustness of different videos. Finally, a comparison between the existing methods has been employed with some factors like computational time or memory requirements. It is also hoped that this analysis helps researchers to address the difficulty of selecting the most convenient method for background subtraction

    Computer vision based techniques for fall detection with application towards assisted living

    Get PDF
    In this thesis, new computer vision based techniques are proposed to detect falls of an elderly person living alone. This is an important problem in assisted living. Different types of information extracted from video recordings are exploited for fall detection using both analytical and machine learning techniques. Initially, a particle filter is used to extract a 2D cue, head velocity, to determine a likely fall event. The human body region is then extracted with a modern background subtraction algorithm. Ellipse fitting is used to represent this shape and its orientation angle is employed for fall detection. An analytical method is used by setting proper thresholds against which the head velocity and orientation angle are compared for fall discrimination. Movement amplitude is then integrated into the fall detector to reduce false alarms. Since 2D features can generate false alarms and are not invariant to different directions, more robust 3D features are next extracted from a 3D person representation formed from video measurements from multiple calibrated cameras. Instead of using thresholds, different data fitting methods are applied to construct models corresponding to fall activities. These are then used to distinguish falls and non-falls. In the final works, two practical fall detection schemes which use only one un-calibrated camera are tested in a real home environment. These approaches are based on 2D features which describe human body posture. These extracted features are then applied to construct either a supervised method for posture classification or an unsupervised method for abnormal posture detection. Certain rules which are set according to the characteristics of fall activities are lastly used to build robust fall detection methods. Extensive evaluation studies are included to confirm the efficiency of the schemes

    Improved Behavior Monitoring and Classification Using Cues Parameters Extraction from Camera Array Images

    Get PDF
    Behavior monitoring and classification is a mechanism used to automatically identify or verify individual based on their human detection, tracking and behavior recognition from video sequences captured by a depth camera. In this paper, we designed a system that precisely classifies the nature of 3D body postures obtained by Kinect using an advanced recognizer. We proposed novel features that are suitable for depth data. These features are robust to noise, invariant to translation and scaling, and capable of monitoring fast human bodyparts movements. Lastly, advanced hidden Markov model is used to recognize different activities. In the extensive experiments, we have seen that our system consistently outperforms over three depth-based behavior datasets, i.e., IM-DailyDepthActivity, MSRDailyActivity3D and MSRAction3D in both posture classification and behavior recognition. Moreover, our system handles subject's body parts rotation, self-occlusion and body parts missing which significantly track complex activities and improve recognition rate. Due to easy accessible, low-cost and friendly deployment process of depth camera, the proposed system can be applied over various consumer-applications including patient-monitoring system, automatic video surveillance, smart homes/offices and 3D games

    New Insights of Background Estimation and Region Localization

    Get PDF
    Subtraction of background in a crowded scene is a crucial and challenging task of monitoring the surveillance systems. Because of the similarity between the foreground object and the background, it is known that the background detection and moving foreground objects is difficult. Most of the previous works emphasize this field but they cannot distinguish the foreground from background due to the challenges of gradual or sudden illumination changes, high-frequencies background objects of motion changes, background geometry changes and noise. After getting the foreground objects, segmentation is need to localize the objects region. Image segmentation is a useful tool in many areas, such as object recognition, image processing, medical image analysis, 3D reconstruction, etc. In order to provide a reliable foreground image, a carefully estimated background model is needed. To tackle the issues of illumination changes and motion changes, this paper establishes an effective new insight of background subtraction and segmentation that accurately detect and segment the foreground people. The scene background is investigates by a new insight, namely Mean Subtraction Background Estimation (MS), which identifies and modifies the pixels extracted from the difference of the background and the current frame. Unlike other works, the first frame is calculated by MS instead of taking the first frame as an initial background. Then, this paper make the foreground segmentation in the noisy scene by foreground detection and then localize these detected areas by analyzing various segmentation methods. Calculation experiments on the challenging public crowd counting dataset achieve the best accuracy than state-of-the-art results. This indicates the effectiveness of the proposed work

    VIDEO FOREGROUND LOCALIZATION FROM TRADITIONAL METHODS TO DEEP LEARNING

    Get PDF
    These days, detection of Visual Attention Regions (VAR), such as moving objects has become an integral part of many Computer Vision applications, viz. pattern recognition, object detection and classification, video surveillance, autonomous driving, human-machine interaction (HMI), and so forth. The moving object identification using bounding boxes has matured to the level of localizing the objects along their rigid borders and the process is called foreground localization (FGL). Over the decades, many image segmentation methodologies have been well studied, devised, and extended to suit the video FGL. Despite that, still, the problem of video foreground (FG) segmentation remains an intriguing task yet appealing due to its ill-posed nature and myriad of applications. Maintaining spatial and temporal coherence, particularly at object boundaries, persists challenging, and computationally burdensome. It even gets harder when the background possesses dynamic nature, like swaying tree branches or shimmering water body, and illumination variations, shadows cast by the moving objects, or when the video sequences have jittery frames caused by vibrating or unstable camera mounts on a surveillance post or moving robot. At the same time, in the analysis of traffic flow or human activity, the performance of an intelligent system substantially depends on its robustness of localizing the VAR, i.e., the FG. To this end, the natural question arises as what is the best way to deal with these challenges? Thus, the goal of this thesis is to investigate plausible real-time performant implementations from traditional approaches to modern-day deep learning (DL) models for FGL that can be applicable to many video content-aware applications (VCAA). It focuses mainly on improving existing methodologies through harnessing multimodal spatial and temporal cues for a delineated FGL. The first part of the dissertation is dedicated for enhancing conventional sample-based and Gaussian mixture model (GMM)-based video FGL using probability mass function (PMF), temporal median filtering, and fusing CIEDE2000 color similarity, color distortion, and illumination measures, and picking an appropriate adaptive threshold to extract the FG pixels. The subjective and objective evaluations are done to show the improvements over a number of similar conventional methods. The second part of the thesis focuses on exploiting and improving deep convolutional neural networks (DCNN) for the problem as mentioned earlier. Consequently, three models akin to encoder-decoder (EnDec) network are implemented with various innovative strategies to improve the quality of the FG segmentation. The strategies are not limited to double encoding - slow decoding feature learning, multi-view receptive field feature fusion, and incorporating spatiotemporal cues through long-shortterm memory (LSTM) units both in the subsampling and upsampling subnetworks. Experimental studies are carried out thoroughly on all conditions from baselines to challenging video sequences to prove the effectiveness of the proposed DCNNs. The analysis demonstrates that the architectural efficiency over other methods while quantitative and qualitative experiments show the competitive performance of the proposed models compared to the state-of-the-art
    corecore