265 research outputs found

    Robust pedestrian detection and tracking in crowded scenes

    Get PDF
    In this paper, a robust computer vision approach to detecting and tracking pedestrians in unconstrained crowded scenes is presented. Pedestrian detection is performed via a 3D clustering process within a region-growing framework. The clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. Pedestrian tracking is achieved by formulating the track matching process as a weighted bipartite graph and using a Weighted Maximum Cardinality Matching scheme. The approach is evaluated using both indoor and outdoor sequences, captured using a variety of different camera placements and orientations, that feature significant challenges in terms of the number of pedestrians present, their interactions and scene lighting conditions. The evaluation is performed against a manually generated groundtruth for all sequences. Results point to the extremely accurate performance of the proposed approach in all cases

    Pedestrian detection and tracking using stereo vision techniques

    Get PDF
    Automated pedestrian detection, counting and tracking has received significant attention from the computer vision community of late. Many of the person detection techniques described so far in the literature work well in controlled environments, such as laboratory settings with a small number of people. This allows various assumptions to be made that simplify this complex problem. The performance of these techniques, however, tends to deteriorate when presented with unconstrained environments where pedestrian appearances, numbers, orientations, movements, occlusions and lighting conditions violate these convenient assumptions. Recently, 3D stereo information has been proposed as a technique to overcome some of these issues and to guide pedestrian detection. This thesis presents such an approach, whereby after obtaining robust 3D information via a novel disparity estimation technique, pedestrian detection is performed via a 3D point clustering process within a region-growing framework. This clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. This pedestrian detection technique requires no external training and is able to robustly handle challenging real-world unconstrained environments from various camera positions and orientations. In addition, this thesis presents a continuous detect-and-track approach, with additional kinematic constraints and explicit occlusion analysis, to obtain robust temporal tracking of pedestrians over time. These approaches are experimentally validated using challenging datasets consisting of both synthetic data and real-world sequences gathered from a number of environments. In each case, the techniques are evaluated using both 2D and 3D groundtruth methodologies

    BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning

    Get PDF
    Social distancing, an essential public health measure to limit the spread of contagious diseases, has gained significant attention since the outbreak of the COVID-19 pandemic. In this work, the problem of visual social distancing compliance assessment in busy public areas, with wide field-of-view cameras, is considered. A dataset of crowd scenes with people annotations under a bird's eye view (BEV) and ground truth for metric distances is introduced, and several measures for the evaluation of social distance detection systems are proposed. A multi-branch network, BEV-Net, is proposed to localize individuals in world coordinates and identify high-risk regions where social distancing is violated. BEV-Net combines detection of head and feet locations, camera pose estimation, a differentiable homography module to map image into BEV coordinates, and geometric reasoning to produce a BEV map of the people locations in the scene. Experiments on complex crowded scenes demonstrate the power of the approach and show superior performance over baselines derived from methods in the literature. Applications of interest for public health decision makers are finally discussed. Datasets, code and pretrained models are publicly available at GitHub.Comment: Published as a conference paper at International Conference on Computer Vision, 202

    Unsupervised Methods for Camera Pose Estimation and People Counting in Crowded Scenes

    Get PDF
    Most visual crowd counting methods rely on training with labeled data to learn a mapping between features in the image and the number of people in the scene. However, the exact nature of this mapping may change as a function of different scene and viewing conditions, limiting the ability of such supervised systems to generalize to novel conditions, and thus preventing broad deployment. Here I propose an alternative, unsupervised strategy anchored on a 3D simulation that automatically learns how groups of people appear in the image and adapts to the signal processing parameters of the current viewing scenario. To implement this 3D strategy, knowledge of the camera parameters is required. Most methods for automatic camera calibration make assumptions about regularities in scene structure or motion patterns, which do not always apply. I propose a novel motion based approach for recovering camera tilt that does not require tracking. Having an automatic camera calibration method allows for the implementation of an accurate crowd counting algorithm that reasons in 3D. The system is evaluated on various datasets and compared against state-of-art methods

    Vision-Based Production of Personalized Video

    No full text
    In this paper we present a novel vision-based system for the automated production of personalised video souvenirs for visitors in leisure and cultural heritage venues. Visitors are visually identified and tracked through a camera network. The system produces a personalized DVD souvenir at the end of a visitor’s stay allowing visitors to relive their experiences. We analyze how we identify visitors by fusing facial and body features, how we track visitors, how the tracker recovers from failures due to occlusions, as well as how we annotate and compile the final product. Our experiments demonstrate the feasibility of the proposed approach

    Back-Action Evading Measurements of Nanomechanical Motion Approaching Quantum Limits

    Get PDF
    The application of quantum mechanics to macroscopic motion suggests many counterintuitive phenomena. While the quantum nature of the motion of individual atoms and molecules has long been successfully studied, an equivalent demonstration of the motion of a near-macroscopic structure remains a challenge in experimental physics. A nanomechanical resonator is an excellent system for such a study. It typically contains > 1010 atoms, and it may be modeled in terms of macroscopic parameters such as bulk density and elasticity. Yet it behaves like a simple harmonic oscillator, with mass low enough and resonant frequency high enough for its quantum zero-point motion and single energy quanta to be experimentally accessible. In pursuit of quantum phenomena in a mechanical oscillator, two important goals are to prepare the oscillator in its quantum ground state, and to measure its position with a precision limited by the Heisenberg uncertainty principle. In this work we have demonstrated techniques that advance towards both of these goals. Our system comprises a 30 micron × 170 nm, 2.2 pg, 5.57 MHz nanomechanical resonator capacitively coupled to a 5 GHz superconducting microwave resonator. The microwave resonator and nanomechanical resonator are fabricated together onto a single silicon chip and measured in a dilution refrigerator at temperatures below 150 mK. At these temperatures the coupling of the motion to the thermal environment is very small, resulting in a very high mechanical Q, approaching ∼ 106. By driving with a microwave pump signal, we observed sidebands generated by the mechanical motion and used these to measure the thermal motion of the resonator. Applying a pump tone red-detuned from the microwave resonance, we used the microwave field to damp the mechanical resonator, extracting energy and "cooling" the motion in a manner similar to optical cooling of trapped atoms. Starting from a mode temperature of ∼ 150 mK, we reached ∼ 40 mK by this "backaction cooling" technique, corresponding to an occupation factor only ∼ 150 times above the ground state of motion. We also determined the precision of our device in measurement of position. Quantum mechanics dictates that, in a continuous position measurement, the precision may be no better than the zero-point motion of the resonator. Increasing the coupling of the resonator to detector will eventually result in back-action driving of the motion, adding imprecision and enforcing this limit. We demonstrated that our system is capable of precisions approaching this limit, and identified the primary experimental factors preventing us from reaching it: noise added to the measurement by our amplifier, and excess dissipation appearing in our microwave resonator at high pump powers. Furthermore, by applying both red- and blue-detuned phase-coherent microwave pump signals, we demonstrated back-action evading (BAE) measurement sensitive to only a single quadrature of the motion. By avoiding the back-action driving in the measured quadrature, such a technique has the potential for precisions surpassing the limit of the zero-point motion. With this method, we achieved a measurement precision of ∼ 100 fm, or 4 times the quantum zero-point motion of the mechanical resonator. We found that the measured quadrature is insensitive to back-action driving by at least a factor of 82 relative to the unmeasured quadrature. We also identified a mechanical parametric amplification effect which arises during the BAE measurement. This effect sets limits on the BAE performance but also mechanically preamplifies the motion, resulting in a position resolution 1.3 times the zero-point motion. We discuss how to overcome the experimental limits set by amplifier noise, pump power and parametric amplification. These results serve to define the path forward for demonstrating truly quantum-limited measurement and non-classical states of motion in a nearly-macroscopic object

    Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs

    Get PDF
    Most modern approaches for video-based multiple people tracking rely on human appearance to exploit similarities between person detections. Consequently, tracking accuracy degrades if this kind of information is not discriminative or if people change apparel. In contrast, we present a method to fuse video information with additional motion signals from body-worn inertial measurement units (IMUs). In particular, we propose a neural network to relate person detections with IMU orientations, and formulate a graph labeling problem to obtain a tracking solution that is globally consistent with the video and inertial recordings. The fusion of visual and inertial cues provides several advantages. The association of detection boxes in the video and IMU devices is based on motion, which is independent of a person's outward appearance. Furthermore, inertial sensors provide motion information irrespective of visual occlusions. Hence, once detections in the video are associated with an IMU device, intermediate positions can be reconstructed from corresponding inertial sensor data, which would be unstable using video only. Since no dataset exists for this new setting, we release a dataset of challenging tracking sequences, containing video and IMU recordings together with ground-truth annotations. We evaluate our approach on our new dataset, achieving an average IDF1 score of 91.2%. The proposed method is applicable to any situation that allows one to equip people with inertial sensors. © 1992-2012 IEEE

    Multitask variational autoencoding of human-to-human object handover

    Get PDF
    Assistive robots that operate alongside humans require the ability to understand and replicate human behaviours during a handover. A handover is defined as a joint action between two participants in which a giver hands an object over to the receiver. In this paper, we present a method for learning human-to-human handovers observed from motion capture data. Given the giver and receiver pose from a single timestep, and the object label in the form of a word embedding, our Multitask Variational Autoencoder jointly forecasts their pose as well as the orientation of the object held by the giver at handover. Our method is in large contrast to existing works for human pose forecasting that employ deep autoregressive models requiring a sequence of inputs. Furthermore, our method is novel in that it learns both the human pose and object orientation in a joint manner. Experimental results on the publicly available Handover Orientation and Motion Capture Dataset show that our proposed method outperforms the autoregressive baselines for handover pose forecasting by approximately 20% while being on-par for object orientation prediction with a runtime that is 5x faster.

    Activity Representation from Video Using Statistical Models on Shape Manifolds

    Get PDF
    Activity recognition from video data is a key computer vision problem with applications in surveillance, elderly care, etc. This problem is associated with modeling a representative shape which contains significant information about the underlying activity. In this dissertation, we represent several approaches for view-invariant activity recognition via modeling shapes on various shape spaces and Riemannian manifolds. The first two parts of this dissertation deal with activity modeling and recognition using tracks of landmark feature points. The motion trajectories of points extracted from objects involved in the activity are used to build deformation shape models for each activity, and these models are used for classification and detection of unusual activities. In the first part of the dissertation, these models are represented by the recovered 3D deformation basis shapes corresponding to the activity using a non-rigid structure from motion formulation. We use a theory for estimating the amount of deformation for these models from the visual data. We study the special case of ground plane activities in detail because of its importance in video surveillance applications. In the second part of the dissertation, we propose to model the activity by learning an affine invariant deformation subspace representation that captures the space of possible body poses associated with the activity. These subspaces can be viewed as points on a Grassmann manifold. We propose several statistical classification models on Grassmann manifold that capture the statistical variations of the shape data while following the intrinsic Riemannian geometry of these manifolds. The last part of this dissertation addresses the problem of recognizing human gestures from silhouette images. We represent a human gesture as a temporal sequence of human poses, each characterized by a contour of the associated human silhouette. The shape of a contour is viewed as a point on the shape space of closed curves and, hence, each gesture is characterized and modeled as a trajectory on this shape space. We utilize the Riemannian geometry of this space to propose a template-based and a graphical-based approaches for modeling these trajectories. The two models are designed in such a way to account for the different invariance requirements in gesture recognition, and also capture the statistical variations associated with the contour data
    corecore