163 research outputs found
UHD映像のための前景物体検出の高速化
早大学位記番号:新7460早稲田大
Background Subtraction Methods in Video Streams: A Review
Background subtraction is one of the most important parts in image and video processing field. There are some unnecessary parts during the image or video processing, and should be removed, because they lead to more execution time or required memory. Several subtraction methods have been presented for the time being, but find the best-suited method is an issue, which this study is going to address. Furthermore, each process needs to the specific subtraction technique, and knowing this issue helps researchers to achieve faster and higher performance in their research. This paper presents a comparative study of several existing background subtraction methods which have been investigated from simple background subtraction to more complex statistical techniques. The goal of this study is to provide a view of the strengths and drawbacks of the widely used methods. The methods are compared based on their memory requirement, the computational time and their robustness of different videos. Finally, a comparison between the existing methods has been employed with some factors like computational time or memory requirements. It is also hoped that this analysis helps researchers to address the difficulty of selecting the most convenient method for background subtraction
Computer vision based techniques for fall detection with application towards assisted living
In this thesis, new computer vision based techniques are proposed to detect falls of an elderly person living alone. This is an important problem in assisted living.
Different types of information extracted from video recordings are exploited for fall detection using both analytical and machine learning techniques. Initially, a particle filter is used to extract a 2D cue, head velocity,
to determine a likely fall event. The human body region is then extracted with a modern background subtraction algorithm. Ellipse fitting is used to represent this shape and its orientation angle is employed for fall detection. An analytical method is used by setting proper thresholds against which the head velocity and orientation angle are compared for fall discrimination. Movement amplitude is then integrated into the fall detector to reduce false
alarms.
Since 2D features can generate false alarms and are not invariant to different directions, more robust 3D features are next extracted from a 3D person representation formed from video measurements from multiple calibrated cameras. Instead of using thresholds, different data fitting methods
are applied to construct models corresponding to fall activities. These are then used to distinguish falls and non-falls.
In the final works, two practical fall detection schemes which use only one un-calibrated camera are tested in a real home environment. These approaches are based on 2D features which describe human body posture. These extracted features are then applied to construct either a supervised method for posture classification or an unsupervised method for abnormal posture detection. Certain rules which are set according to the characteristics of fall activities are lastly used to build robust fall detection methods. Extensive evaluation studies are included to confirm the efficiency of the
schemes
Improved Behavior Monitoring and Classification Using Cues Parameters Extraction from Camera Array Images
Behavior monitoring and classification is a mechanism used to automatically identify or verify individual based on their human detection, tracking and behavior recognition from video sequences captured by a depth camera. In this paper, we designed a system that precisely classifies the nature of 3D body postures obtained by Kinect using an advanced recognizer. We proposed novel features that are suitable for depth data. These features are robust to noise, invariant to translation and scaling, and capable of monitoring fast human bodyparts movements. Lastly, advanced hidden Markov model is used to recognize different activities. In the extensive experiments, we have seen that our system consistently outperforms over three depth-based behavior datasets, i.e., IM-DailyDepthActivity, MSRDailyActivity3D and MSRAction3D in both posture classification and behavior recognition. Moreover, our system handles subject's body parts rotation, self-occlusion and body parts missing which significantly track complex activities and improve recognition rate. Due to easy accessible, low-cost and friendly deployment process of depth camera, the proposed system can be applied over various consumer-applications including patient-monitoring system, automatic video surveillance, smart homes/offices and 3D games
New Insights of Background Estimation and Region Localization
Subtraction of background in a crowded scene is a crucial and challenging task of monitoring the surveillance systems. Because of the similarity between the foreground object and the background, it is known that the background detection and moving foreground objects is difficult. Most of the previous works emphasize this field but they cannot distinguish the foreground from background due to the challenges of gradual or sudden illumination changes, high-frequencies background objects of motion changes, background geometry changes and noise. After getting the foreground objects, segmentation is need to localize the objects region. Image segmentation is a useful tool in many areas, such as object recognition, image processing, medical image analysis, 3D reconstruction, etc. In order to provide a reliable foreground image, a carefully estimated background model is needed. To tackle the issues of illumination changes and motion changes, this paper establishes an effective new insight of background subtraction and segmentation that accurately detect and segment the foreground people. The scene background is investigates by a new insight, namely Mean Subtraction Background Estimation (MS), which identifies and modifies the pixels extracted from the difference of the background and the current frame. Unlike other works, the first frame is calculated by MS instead of taking the first frame as an initial background. Then, this paper make the foreground segmentation in the noisy scene by foreground detection and then localize these detected areas by analyzing various segmentation methods. Calculation experiments on the challenging public crowd counting dataset achieve the best accuracy than state-of-the-art results. This indicates the effectiveness of the proposed work
VIDEO FOREGROUND LOCALIZATION FROM TRADITIONAL METHODS TO DEEP LEARNING
These days, detection of Visual Attention Regions (VAR), such as moving objects has become an integral part of many Computer Vision applications, viz. pattern recognition, object detection and classification, video surveillance, autonomous driving, human-machine interaction (HMI), and so forth. The moving object identification using bounding boxes has matured to the level of localizing the objects along their rigid borders and the process is called foreground localization (FGL). Over the decades, many image segmentation methodologies have been well studied, devised, and extended to suit the video FGL. Despite that, still, the problem of video foreground (FG) segmentation remains an intriguing task yet appealing due to its ill-posed nature and myriad of applications. Maintaining spatial and temporal coherence, particularly at object boundaries, persists challenging, and computationally burdensome. It even gets harder when the background possesses dynamic nature, like swaying tree branches or shimmering water body, and illumination variations, shadows cast by the moving objects, or when the video sequences have jittery frames caused by vibrating or unstable camera mounts on a surveillance post or moving robot. At the same time, in the analysis of traffic flow or human activity, the performance of an intelligent system substantially depends on its robustness of localizing the VAR, i.e., the FG. To this end, the natural question arises as what is the best way to deal with these challenges? Thus, the goal of this thesis is to investigate plausible real-time performant implementations from traditional approaches to modern-day deep learning (DL) models for FGL that can be applicable to many video content-aware applications (VCAA). It focuses mainly on improving existing methodologies through harnessing multimodal spatial and temporal cues for a delineated FGL. The first part of the dissertation is dedicated for enhancing conventional sample-based and Gaussian mixture model (GMM)-based video FGL using probability mass function (PMF), temporal median filtering, and fusing CIEDE2000 color similarity, color distortion, and illumination measures, and picking an appropriate adaptive threshold to extract the FG pixels. The subjective and objective evaluations are done to show the improvements over a number of similar conventional methods. The second part of the thesis focuses on exploiting and improving deep convolutional neural networks (DCNN) for the problem as mentioned earlier. Consequently, three models akin to encoder-decoder (EnDec) network are implemented with various innovative strategies to improve the quality of the FG segmentation. The strategies are not limited to double encoding - slow decoding feature learning, multi-view receptive field feature fusion, and incorporating spatiotemporal cues through long-shortterm memory (LSTM) units both in the subsampling and upsampling subnetworks. Experimental studies are carried out thoroughly on all conditions from baselines to challenging video sequences to prove the effectiveness of the proposed DCNNs. The analysis demonstrates that the architectural efficiency over other methods while quantitative and qualitative experiments show the competitive performance of the proposed models compared to the state-of-the-art
Recommended from our members
Automated Detection and Counting of Pedestrians on an Urban Roadside
This thesis implements an automated system that counts pedestrians with 85% accuracy. Two approaches have been considered and evaluated in terms of count accuracy, cost and ease of deployment. The first approach employs the Autoscope Solo Terra, a traffic camera which is widely used to monitor vehicular traffic. The Solo Terra supports an image processing-based detector that counts the number of objects crossing user-defined areas in the captured image. The count is updated based on the amount of movement across the selected regions. Therefore, a second approach has been considered that uses a histogram of oriented gradients (HoG), an advanced vision based algorithm proposed by Dalal et al. which distinguishes a pedestrian from a non-pedestrian based on an omega shape formed by the head and shoulders of a human being. The implemented detection software processes video frames that are streamed from a low-cost digital camera. The frames are divided into sub-regions which are scanned for an omega shape whenever movement is detected in those regions. It has been found that the HoG-based approach degrades in performance due to occlusion under dense pedestrian traffic conditions whereas the Solo Terra approach appears to be more robust. Undercounts and overcounts were encountered using the Solo Terra approach. To combat the disadvantages of both the approaches, they were integrated to form a single system where count is incremented predominantly using the Solo Terra. The HoG-based approach corrects the obtained count under certain conditions. A preliminary prototype of the integrated system has been verified
- …