377 research outputs found

    Distortion Correction for 3D Scan of Trunk Swaying Human Body Segments

    Get PDF
    We propose a method for acquiring a 3D shape of human body segments accurately. Using a light stripe triangulation range finder, we can acquire accurate 3D shape of a motionless object in dozens of seconds. If the object moves during the scanning, the acquired shape would be distorted. Naturally, humans move slightly for making balance while standing even if the subject makes an effort to stay still for avoiding the distortion in acquired shape. Our method corrects the distortion based on measured subject's motion during the scanning. Experimental results show the accuracy of the proposed method. Trunk swaying degrades the accuracy of the light stripe triangulation from 1mm to 10mm. We can keep the accuracy of as good as 2mm by applying our method

    Properties of pedestrians walking in line: Stepping behavior

    Full text link
    In human crowds, interactions among individuals give rise to a variety of self-organized collective motions that help the group to effectively solve the problem of coordination. However, it is still not known exactly how humans adjust their behavior locally, nor what are the direct consequences on the emergent organization. One of the underlying mechanisms of adjusting individual motions is the stepping dynamics. In this paper, we present first quantitative analysis on the stepping behavior in a one-dimensional pedestrian flow studied under controlled laboratory conditions. We find that the step length is proportional to the velocity of the pedestrian, and is directly related to the space available in front of him, while the variations of the step duration are much smaller. This is in contrast with locomotion studies performed on isolated pedestrians and shows that the local density has a direct influence on the stepping characteristics. Furthermore, we study the phenomena of synchronization -walking in lockstep- and show its dependence on flow densities. We show that the synchronization of steps is particularly important at high densities, which has direct impact on the studies of optimizing pedestrians flow in congested situations. However, small synchronization and antisynchronization effects are found also at very low densities, for which no steric constraints exist between successive pedestrians, showing the natural tendency to synchronize according to perceived visual signals.Comment: 8 pages, 5 figure

    Human Respiration Rate Measurement with High-Speed Digital Fringe Projection Technique

    Get PDF
    This paper proposes a non-contact continuous respiration monitoring method based on Fringe Projection Profilometry (FPP). This method aims to overcome the limitations of traditional intrusive techniques by providing continuous monitoring without interfering with normal breathing. The FPP sensor captures three-dimensional (3D) respiratory motion from the chest wall and abdomen, and the analysis algorithms extract respiratory parameters. The system achieved a high Signal-to-Noise Ratio (SNR) of 37 dB with an ideal sinusoidal respiration signal. Experimental results demonstrated that a mean correlation of 0.95 and a mean Root-Mean-Square Error (RMSE) of 0.11 breaths per minute (bpm) were achieved when comparing to a reference signal obtained from a spirometer


    Get PDF
    These days, detection of Visual Attention Regions (VAR), such as moving objects has become an integral part of many Computer Vision applications, viz. pattern recognition, object detection and classification, video surveillance, autonomous driving, human-machine interaction (HMI), and so forth. The moving object identification using bounding boxes has matured to the level of localizing the objects along their rigid borders and the process is called foreground localization (FGL). Over the decades, many image segmentation methodologies have been well studied, devised, and extended to suit the video FGL. Despite that, still, the problem of video foreground (FG) segmentation remains an intriguing task yet appealing due to its ill-posed nature and myriad of applications. Maintaining spatial and temporal coherence, particularly at object boundaries, persists challenging, and computationally burdensome. It even gets harder when the background possesses dynamic nature, like swaying tree branches or shimmering water body, and illumination variations, shadows cast by the moving objects, or when the video sequences have jittery frames caused by vibrating or unstable camera mounts on a surveillance post or moving robot. At the same time, in the analysis of traffic flow or human activity, the performance of an intelligent system substantially depends on its robustness of localizing the VAR, i.e., the FG. To this end, the natural question arises as what is the best way to deal with these challenges? Thus, the goal of this thesis is to investigate plausible real-time performant implementations from traditional approaches to modern-day deep learning (DL) models for FGL that can be applicable to many video content-aware applications (VCAA). It focuses mainly on improving existing methodologies through harnessing multimodal spatial and temporal cues for a delineated FGL. The first part of the dissertation is dedicated for enhancing conventional sample-based and Gaussian mixture model (GMM)-based video FGL using probability mass function (PMF), temporal median filtering, and fusing CIEDE2000 color similarity, color distortion, and illumination measures, and picking an appropriate adaptive threshold to extract the FG pixels. The subjective and objective evaluations are done to show the improvements over a number of similar conventional methods. The second part of the thesis focuses on exploiting and improving deep convolutional neural networks (DCNN) for the problem as mentioned earlier. Consequently, three models akin to encoder-decoder (EnDec) network are implemented with various innovative strategies to improve the quality of the FG segmentation. The strategies are not limited to double encoding - slow decoding feature learning, multi-view receptive field feature fusion, and incorporating spatiotemporal cues through long-shortterm memory (LSTM) units both in the subsampling and upsampling subnetworks. Experimental studies are carried out thoroughly on all conditions from baselines to challenging video sequences to prove the effectiveness of the proposed DCNNs. The analysis demonstrates that the architectural efficiency over other methods while quantitative and qualitative experiments show the competitive performance of the proposed models compared to the state-of-the-art

    Viewpoint-Free Photography for Virtual Reality

    Get PDF
    Viewpoint-free photography, i.e., interactively controlling the viewpoint of a photograph after capture, is a standing challenge. In this thesis, we investigate algorithms to enable viewpoint-free photography for virtual reality (VR) from casual capture, i.e., from footage easily captured with consumer cameras. We build on an extensive body of work in image-based rendering (IBR). Given images of an object or scene, IBR methods aim to predict the appearance of an image taken from a novel perspective. Most IBR methods focus on full or near-interpolation, where the output viewpoints either lie directly between captured images, or nearby. These methods are not suitable for VR, where the user has significant range of motion and can look in all directions. Thus, it is essential to create viewpoint-free photos with a wide field-of-view and sufficient positional freedom to cover the range of motion a user might experience in VR. We focus on two VR experiences: 1) Seated VR experiences, where the user can lean in different directions. This simplifies the problem, as the scene is only observed from a small range of viewpoints. Thus, we focus on easy capture, showing how to turn panorama-style capture into 3D photos, a simple representation for viewpoint-free photos, and also how to speed up processing so users can see the final result on-site. 2) Room-scale VR experiences, where the user can explore vastly different perspectives. This is challenging: More input footage is needed, maintaining real-time display rates becomes difficult, view-dependent appearance and object backsides need to be modelled, all while preventing noticeable mistakes. We address these challenges by: (1) creating refined geometry for each input photograph, (2) using a fast tiled rendering algorithm to achieve real-time display rates, and (3) using a convolutional neural network to hide visual mistakes during compositing. Overall, we provide evidence that viewpoint-free photography is feasible from casual capture. We thoroughly compare with the state-of-the-art, showing that our methods achieve both a numerical improvement and a clear increase in visual quality for both seated and room-scale VR experiences

    Detection and Simulation of Dangerous Human Crowd Behavior

    Get PDF
    Tragically, gatherings of large human crowds quite often end in crowd disasters such as the recent catastrophe at the Loveparade 2010. In the past, research on pedestrian and crowd dynamics focused on simulation of pedestrian motion. As of yet, however, there does not exist any automatic system which can detect hazardous situations in crowds, thus helping to prevent these tragic incidents. In the thesis at hand, we analyze pedestrian behavior in large crowds and observe characteristic motion patterns. Based on our findings, we present a computer vision system that detects unusual events and critical situations from video streams and thus alarms security personnel in order to take necessary actions. We evaluate the system’s performance on synthetic, experimental as well as on real-world data. In particular, we show its effectiveness on the surveillance videos recorded at the Loveparade crowd stampede. Since our method is based on optical flow computations, it meets two crucial prerequisites in video surveillance: Firstly, it works in real-time and, secondly, the privacy of the people being monitored is preserved. In addition to that, we integrate the observed motion patterns into models for simulating pedestrian motion and show that the proposed simulation model produces realistic trajectories. We employ this model to simulate large human crowds and use techniques from computer graphics to render synthetic videos for further evaluation of our automatic video surveillance system

    A spatio-temporal learning approach for crowd activity modelling to detect anomalies

    Get PDF
    With security and surveillance gaining paramount importance in recent years, it has become important to reliably automate some surveillance tasks for monitoring crowded areas. The need to automate this process also supports human operators who are overwhelmed with a large number of security screens to monitor. Crowd events like excess usage throughout the day, sudden peaks in crowd volume, chaotic motion (obvious to spot) all emerge over time which requires constant monitoring in order to be informed of the event build up. To ease this task, the computer vision community has been addressing some surveillance tasks using image processing and machine learning techniques. Currently tasks such as crowd density estimation or people counting, crowd detection and abnormal crowd event detection are being addressed. Most of the work has focused on crowd detection and estimation with the focus slowly shifting on crowd event learning for abnormality detection.This thesis addresses crowd abnormality detection. However, by way of the modelling approach used, implicitly, the tasks of crowd detection and estimation are also handled. The existing approaches in the literature have a number of drawbacks that keep them from being scalable for any public scene. Most pieces of work use simple scene settings where motion occurs wholly in the near-field or far-field of the camera view. Thus, with assumptions on the expected location of person motion, small blobs are arbitrarily filtered out as noise when they may be legitimate motion in the far-field. Such an approach makes it difficult to deal with complex scenes where entry/exit points occur in the centre of the scene or multiple pathways running from the near to the far-field of the camera view that produce blobs of differing sizes. Further, most authors assume the number of directions people motion should exhibit rather than discover what these may be. Approaches with such assumptions would result in loss of accuracy while dealing with (say) a railway platform which shows a number of motion directions, namely two-way, one-way, dispersive, etc. Finally, very few contributions of work use time as a video feature to model the human intuitiveness of time-of-day abnormalities. That is certain motion patterns may be abnormal if they have not been seen for a given time of day. Most works use it (time) as an extra qualifier to spatial data for trajectory definition.In this thesis most of these drawbacks have been addressed by dealing with these in the modelling of crowd activity. Firstly, no assumptions are made on scene structure or blob sizes resulting therefrom. The optical flow algorithm used is robust and even the noise presented (which is infact unwanted motion of swaying hands and legs as opposed to that from the torso) is fairly consistent and therefore can be factored into the modelling. Blobs, no matter what the size are not discarded as they may be legitimate emerging motion in the far-field. The modelling also deals with paths extending from the far to the near-field of the camera view and segments these such that each segment contains self-comparable fields of motion. The need for a normalisation factor for comparisons across near and far field motion fields implies prior knowledge of the scene. As the system is intended for generic public locations having varying scene structures, normalisation is not an option in the processing used and yet the near & far-field motion changes are accounted for. Secondly, this thesis describes a system that learns the true distribution of motion along the detected paths and maintains these. The approach is such that doing so does not generalise the direction distributions which would cause loss in precision. No impositions are made on expected motion and if the underlying motion is well defined (one-way or two-way), then this is represented as a well defined distribution and as a mixture of directions if the underlying motion presents itself as so.Finally, time as a video feature is used to allow for activity to re-enforce itself on a daily basis such that motion patterns for a given time and space begin to define themselves through re-enforcement which acts as the model used for abnormality detection in time and space (spatio-temporal). The system has been tested with real-world data datasets with varying fields of camera view. The testing has shown no false negatives, very few false positives and detects crowd abnormalities quite well with respect to the ground truths of the datasets used
    • …