5,807 research outputs found

    Real-World Repetition Estimation by Div, Grad and Curl

    Get PDF
    We consider the problem of estimating repetition in video, such as performing push-ups, cutting a melon or playing violin. Existing work shows good results under the assumption of static and stationary periodicity. As realistic video is rarely perfectly static and stationary, the often preferred Fourier-based measurements is inapt. Instead, we adopt the wavelet transform to better handle non-static and non-stationary video dynamics. From the flow field and its differentials, we derive three fundamental motion types and three motion continuities of intrinsic periodicity in 3D. On top of this, the 2D perception of 3D periodicity considers two extreme viewpoints. What follows are 18 fundamental cases of recurrent perception in 2D. In practice, to deal with the variety of repetitive appearance, our theory implies measuring time-varying flow and its differentials (gradient, divergence and curl) over segmented foreground motion. For experiments, we introduce the new QUVA Repetition dataset, reflecting reality by including non-static and non-stationary videos. On the task of counting repetitions in video, we obtain favorable results compared to a deep learning alternative

    Continuous Action Recognition Based on Sequence Alignment

    Get PDF
    Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time warping (DTW) framework and devise a novel visual alignment technique, namely dynamic frame warping (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a test sequence with a model sequence. Moreover, we propose two extensions which enable to perform recognition concomitant with segmentation, namely one-pass DFW and two-pass DFW. These two methods have their roots in the domain of continuous recognition of speech and, to the best of our knowledge, their extension to continuous visual action recognition has been overlooked. We test and illustrate the proposed techniques with a recently released dataset (RAVEL) and with two public-domain datasets widely used in action recognition (Hollywood-1 and Hollywood-2). We also compare the performances of the proposed isolated and continuous recognition algorithms with several recently published methods

    Structure from Recurrent Motion: From Rigidity to Recurrency

    Full text link
    This paper proposes a new method for Non-Rigid Structure-from-Motion (NRSfM) from a long monocular video sequence observing a non-rigid object performing recurrent and possibly repetitive dynamic action. Departing from the traditional idea of using linear low-order or lowrank shape model for the task of NRSfM, our method exploits the property of shape recurrency (i.e., many deforming shapes tend to repeat themselves in time). We show that recurrency is in fact a generalized rigidity. Based on this, we reduce NRSfM problems to rigid ones provided that certain recurrency condition is satisfied. Given such a reduction, standard rigid-SfM techniques are directly applicable (without any change) to the reconstruction of non-rigid dynamic shapes. To implement this idea as a practical approach, this paper develops efficient algorithms for automatic recurrency detection, as well as camera view clustering via a rigidity-check. Experiments on both simulated sequences and real data demonstrate the effectiveness of the method. Since this paper offers a novel perspective on rethinking structure-from-motion, we hope it will inspire other new problems in the field.Comment: To appear in CVPR 201

    SEGMENTATION, RECOGNITION, AND ALIGNMENT OF COLLABORATIVE GROUP MOTION

    Get PDF
    Modeling and recognition of human motion in videos has broad applications in behavioral biometrics, content-based visual data analysis, security and surveillance, as well as designing interactive environments. Significant progress has been made in the past two decades by way of new models, methods, and implementations. In this dissertation, we focus our attention on a relatively less investigated sub-area called collaborative group motion analysis. Collaborative group motions are those that typically involve multiple objects, wherein the motion patterns of individual objects may vary significantly in both space and time, but the collective motion pattern of the ensemble allows characterization in terms of geometry and statistics. Therefore, the motions or activities of an individual object constitute local information. A framework to synthesize all local information into a holistic view, and to explicitly characterize interactions among objects, involves large scale global reasoning, and is of significant complexity. In this dissertation, we first review relevant previous contributions on human motion/activity modeling and recognition, and then propose several approaches to answer a sequence of traditional vision questions including 1) which of the motion elements among all are the ones relevant to a group motion pattern of interest (Segmentation); 2) what is the underlying motion pattern (Recognition); and 3) how two motion ensembles are similar and how we can 'optimally' transform one to match the other (Alignment). Our primary practical scenario is American football play, where the corresponding problems are 1) who are offensive players; 2) what are the offensive strategy they are using; and 3) whether two plays are using the same strategy and how we can remove the spatio-temporal misalignment between them due to internal or external factors. The proposed approaches discard traditional modeling paradigm but explore either concise descriptors, hierarchies, stochastic mechanism, or compact generative model to achieve both effectiveness and efficiency. In particular, the intrinsic geometry of the spaces of the involved features/descriptors/quantities is exploited and statistical tools are established on these nonlinear manifolds. These initial attempts have identified new challenging problems in complex motion analysis, as well as in more general tasks in video dynamics. The insights gained from nonlinear geometric modeling and analysis in this dissertation may hopefully be useful toward a broader class of computer vision applications

    Key body pose detection and movement assessment of fitness performances

    Get PDF
    Motion segmentation plays an important role in human motion analysis. Understanding the intrinsic features of human activities represents a challenge for modern science. Current solutions usually involve computationally demanding processing and achieve the best results using expensive, intrusive motion capture devices. In this thesis, research has been carried out to develop a series of methods for affordable and effective human motion assessment in the context of stand-up physical exercises. The objective of the research was to tackle the needs for an autonomous system that could be deployed in nursing homes or elderly people's houses, as well as rehabilitation of high profile sport performers. Firstly, it has to be designed so that instructions on physical exercises, especially in the case of elderly people, can be delivered in an understandable way. Secondly, it has to deal with the problem that some individuals may find it difficult to keep up with the programme due to physical impediments. They may also be discouraged because the activities are not stimulating or the instructions are hard to follow. In this thesis, a series of methods for automatic assessment production, as a combination of worded feedback and motion visualisation, is presented. The methods comprise two major steps. First, a series of key body poses are identified upon a model built by a multi-class classifier from a set of frame-wise features extracted from the motion data. Second, motion alignment (or synchronisation) with a reference performance (the tutor) is established in order to produce a second assessment model. Numerical assessment, first, and textual feedback, after, are delivered to the user along with a 3D skeletal animation to enrich the assessment experience. This animation is produced after the demonstration of the expert is transformed to the current level of performance of the user, in order to help encourage them to engage with the programme. The key body pose identification stage follows a two-step approach: first, the principal components of the input motion data are calculated in order to reduce the dimensionality of the input. Then, candidates of key body poses are inferred using multi-class, supervised machine learning techniques from a set of training samples. Finally, cluster analysis is used to refine the result. Key body pose identification is guaranteed to be invariant to the repetitiveness and symmetry of the performance. Results show the effectiveness of the proposed approach by comparing it against Dynamic Time Warping and Hierarchical Aligned Cluster Analysis. The synchronisation sub-system takes advantage of the cyclic nature of the stretches that are part of the stand-up exercises subject to study in order to remove out-of-sequence identified key body poses (i.e., false positives). Two approaches are considered for performing cycle analysis: a sequential, trivial algorithm and a proposed Genetic Algorithm, with and without prior knowledge on cyclic sequence patterns. These two approaches are compared and the Genetic Algorithm with prior knowledge shows a lower rate of false positives, but also a higher false negative rate. The GAs are also evaluated with randomly generated periodic string sequences. The automatic assessment follows a similar approach to that of key body pose identification. A multi-class, multi-target machine learning classifier is trained with features extracted from previous motion alignment. The inferred numerical assessment levels (one per identified key body pose and involved body joint) are translated into human-understandable language via a highly-customisable, context-free grammar. Finally, visual feedback is produced in the form of a synchronised skeletal animation of both the user's performance and the tutor's. If the user's performance is well below a standard then an affine offset transformation of the skeletal motion data series to an in-between performance is performed, in order to prevent dis-encouragement from the user and still provide a reference for improvement. At the end of this thesis, a study of the limitations of the methods in real circumstances is explored. Issues like the gimbal lock in the angular motion data, lack of accuracy of the motion capture system and the escalation of the training set are discussed. Finally, some conclusions are drawn and future work is discussed

    Carried baggage detection and recognition in video surveillance with foreground segmentation

    Get PDF
    Security cameras installed in public spaces or in private organizations continuously record video data with the aim of detecting and preventing crime. For that reason, video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis, have gained high interest in recent years. In this thesis, the primary focus is on two key aspects of video analysis, reliable moving object segmentation and carried object detection & identification. A novel moving object segmentation scheme by background subtraction is presented in this thesis. The scheme relies on background modelling which is based on multi-directional gradient and phase congruency. As a post processing step, the detected foreground contours are refined by classifying the edge segments as either belonging to the foreground or background. Further contour completion technique by anisotropic diffusion is first introduced in this area. The proposed method targets cast shadow removal, gradual illumination change invariance, and closed contour extraction. A state of the art carried object detection method is employed as a benchmark algorithm. This method includes silhouette analysis by comparing human temporal templates with unencumbered human models. The implementation aspects of the algorithm are improved by automatically estimating the viewing direction of the pedestrian and are extended by a carried luggage identification module. As the temporal template is a frequency template and the information that it provides is not sufficient, a colour temporal template is introduced. The standard steps followed by the state of the art algorithm are approached from a different extended (by colour information) perspective, resulting in more accurate carried object segmentation. The experiments conducted in this research show that the proposed closed foreground segmentation technique attains all the aforementioned goals. The incremental improvements applied to the state of the art carried object detection algorithm revealed the full potential of the scheme. The experiments demonstrate the ability of the proposed carried object detection algorithm to supersede the state of the art method
    • …
    corecore