407 research outputs found

    Real-time action recognition using a multilayer descriptor with variable size

    Get PDF
    Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Video analysis technology has become less expensive and more powerful in terms of storage resources and resolution capacity, promoting progress in a wide range of applications. Video-based human action detection has been used for several tasks in surveillance environments, such as forensic investigation, patient monitoring, medical training, accident prevention, and traffic monitoring, among others. We present a method for action identification based on adaptive training of a multilayer descriptor applied to a single classifier. Cumulative motion shapes (CMSs) are extracted according to the number of frames present in the video. Each CMS is employed as a self-sufficient layer in the training stage but belongs to the same descriptor. A robust classification is achieved through individual responses of classifiers for each layer, and the dominant result is used as a final outcome. Experiments are conducted on five public datasets (Weizmann, KTH, MuHAVi, IXMAS, and URADL) to demonstrate the effectiveness of the method in terms of accuracy in real time. (C) 2016 SPIE and IS&TVideo analysis technology has become less expensive and more powerful in terms of storage resources and resolution capacity, promoting progress in a wide range of applications. Video-based human action detection has been used for several tasks in surveill2501FAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOCNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICOFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)SEM INFORMAÇÃOSEM INFORMAÇÃ

    Articulated human tracking and behavioural analysis in video sequences

    Get PDF
    Recently, there has been a dramatic growth of interest in the observation and tracking of human subjects through video sequences. Arguably, the principal impetus has come from the perceived demand for technological surveillance, however applications in entertainment, intelligent domiciles and medicine are also increasing. This thesis examines human articulated tracking and the classi cation of human movement, rst separately and then as a sequential process. First, this thesis considers the development and training of a 3D model of human body structure and dynamics. To process video sequences, an observation model is also designed with a multi-component likelihood based on edge, silhouette and colour. This is de ned on the articulated limbs, and visible from a single or multiple cameras, each of which may be calibrated from that sequence. Second, for behavioural analysis, we develop a methodology in which actions and activities are described by semantic labels generated from a Movement Cluster Model (MCM). Third, a Hierarchical Partitioned Particle Filter (HPPF) was developed for human tracking that allows multi-level parameter search consistent with the body structure. This tracker relies on the articulated motion prediction provided by the MCM at pose or limb level. Fourth, tracking and movement analysis are integrated to generate a probabilistic activity description with action labels. The implemented algorithms for tracking and behavioural analysis are tested extensively and independently against ground truth on human tracking and surveillance datasets. Dynamic models are shown to predict and generate synthetic motion, while MCM recovers both periodic and non-periodic activities, de ned either on the whole body or at the limb level. Tracking results are comparable with the state of the art, however the integrated behaviour analysis adds to the value of the approach.Overseas Research Students Awards Scheme (ORSAS

    Representation and recognition of human actions in video

    Get PDF
    PhDAutomated human action recognition plays a critical role in the development of human-machine communication, by aiming for a more natural interaction between artificial intelligence and the human society. Recent developments in technology have permitted a shift from a traditional human action recognition performed in a well-constrained laboratory environment to realistic unconstrained scenarios. This advancement has given rise to new problems and challenges still not addressed by the available methods. Thus, the aim of this thesis is to study innovative approaches that address the challenging problems of human action recognition from video captured in unconstrained scenarios. To this end, novel action representations, feature selection methods, fusion strategies and classification approaches are formulated. More specifically, a novel interest points based action representation is firstly introduced, this representation seeks to describe actions as clouds of interest points accumulated at different temporal scales. The idea behind this method consists of extracting holistic features from the point clouds and explicitly and globally describing the spatial and temporal action dynamic. Since the proposed clouds of points representation exploits alternative and complementary information compared to the conventional interest points-based methods, a more solid representation is then obtained by fusing the two representations, adopting a Multiple Kernel Learning strategy. The validity of the proposed approach in recognising action from a well-known benchmark dataset is demonstrated as well as the superior performance achieved by fusing representations. Since the proposed method appears limited by the presence of a dynamic background and fast camera movements, a novel trajectory-based representation is formulated. Different from interest points, trajectories can simultaneously retain motion and appearance information even in noisy and crowded scenarios. Additionally, they can handle drastic camera movements and a robust region of interest estimation. An equally important contribution is the proposed collaborative feature selection performed to remove redundant and noisy components. In particular, a novel feature selection method based on Multi-Class Delta Latent Dirichlet Allocation (MC-DLDA) is introduced. Crucial, to enrich the final action representation, the trajectory representation is adaptively fused with a conventional interest point representation. The proposed approach is extensively validated on different datasets, and the reported performances are comparable with the best state-of-the-art. The obtained results also confirm the fundamental contribution of both collaborative feature selection and adaptive fusion. Finally, the problem of realistic human action classification in very ambiguous scenarios is taken into account. In these circumstances, standard feature selection methods and multi-class classifiers appear inadequate due to: sparse training set, high intra-class variation and inter-class similarity. Thus, both the feature selection and classification problems need to be redesigned. The proposed idea is to iteratively decompose the classification task in subtasks and select the optimal feature set and classifier in accordance with the subtask context. To this end, a cascaded feature selection and action classification approach is introduced. The proposed cascade aims to classify actions by exploiting as much information as possible, and at the same time trying to simplify the multi-class classification in a cascade of binary separations. Specifically, instead of separating multiple action classes simultaneously, the overall task is automatically divided into easier binary sub-tasks. Experiments have been carried out using challenging public datasets; the obtained results demonstrate that with identical action representation, the cascaded classifier significantly outperforms standard multi-class classifiers

    Feature-based annealing particle filter for robust motion capture

    Get PDF
    This thesis presents a new annealing method for particle filtering aiming at body pose estimation. Particle filters are Monte Carlo methods commonly employed in non-linear and non-Gaussian Bayesian problems, such as the estimation of human dynamics. However, they are ine±cient in high-dimensional state spaces. Annealed particle filter copes with such spaces by introducing a layered stochastic search. Our algorithm aims at generalizing and enhancing the classical annealed particle filter. Diferent image features are exploited in a sequential importance sampling scheme to build better proposal distributions from likelihood. This technique, termed Feature-Based Annealing, is inferred from the required function properties in the annealing process and the properties of the weighting functions obtained with common image features in the field of body tracking. Comparative results between the proposed strategy and common annealed particle filter are shown to assess the robustness of the algorithm

    Activity Analysis; Finding Explanations for Sets of Events

    Get PDF
    Automatic activity recognition is the computational process of analysing visual input and reasoning about detections to understand the performed events. In all but the simplest scenarios, an activity involves multiple interleaved events, some related and others independent. The activity in a car park or at a playground would typically include many events. This research assumes the possible events and any constraints between the events can be defined for the given scene. Analysing the activity should thus recognise a complete and consistent set of events; this is referred to as a global explanation of the activity. By seeking a global explanation that satisfies the activity’s constraints, infeasible interpretations can be avoided, and ambiguous observations may be resolved. An activity’s events and any natural constraints are defined using a grammar formalism. Attribute Multiset Grammars (AMG) are chosen because they allow defining hierarchies, as well as attribute rules and constraints. When used for recognition, detectors are employed to gather a set of detections. Parsing the set of detections by the AMG provides a global explanation. To find the best parse tree given a set of detections, a Bayesian network models the probability distribution over the space of possible parse trees. Heuristic and exhaustive search techniques are proposed to find the maximum a posteriori global explanation. The framework is tested for two activities: the activity in a bicycle rack, and around a building entrance. The first case study involves people locking bicycles onto a bicycle rack and picking them up later. The best global explanation for all detections gathered during the day resolves local ambiguities from occlusion or clutter. Intensive testing on 5 full days proved global analysis achieves higher recognition rates. The second case study tracks people and any objects they are carrying as they enter and exit a building entrance. A complete sequence of the person entering and exiting multiple times is recovered by the global explanation

    Eigenvector-based Dimensionality Reduction for Human Activity Recognition and Data Classification

    Get PDF
    In the context of appearance-based human motion compression, representation, and recognition, we have proposed a robust framework based on the eigenspace technique. First, the new appearance-based template matching approach which we named Motion Intensity Image for compressing a human motion video into a simple and concise, yet very expressive representation. Second, a learning strategy based on the eigenspace technique is employed for dimensionality reduction using each of PCA and FDA, while providing maximum data variance and maximum class separability, respectively. Third, a new compound eigenspace is introduced for multiple directed motion recognition that takes care also of the possible changes in scale. This method extracts two more features that are used to control the recognition process. A similarity measure, based on Euclidean distance, has been employed for matching dimensionally-reduced testing templates against a projected set of known motions templates. In the stream of nonlinear classification, we have introduced a new eigenvector-based recognition model, built upon the idea of the kernel technique. A practical study on the use of the kernel technique with 18 different functions has been carried out. We have shown in this study how crucial choosing the right kernel function is, for the success of the subsequent linear discrimination in the feature space for a particular problem. Second, building upon the theory of reproducing kernels, we have proposed a new robust nonparametric discriminant analysis approach with kernels. Our proposed technique can efficiently find a nonparametric kernel representation where linear discriminants can perform better. Data classification is achieved by integrating the linear version of the NDA with the kernel mapping. Based on the kernel trick, we have provided a new formulation for Fisher\u27s criterion, defined in terms of the Gram matrix only

    Marker-less human body part detection, labelling and tracking for human activity recognition

    Get PDF
    This thesis focuses on the development of a real-time and cost effective marker-less computer vision method for significant body point or part detection (i.e., the head, arm, shoulder, knee, and feet), labelling and tracking, and its application to activity recognition. This work comprises of three parts: significantbody point detection and labelling, significant body point tracking, and activity recognition. Implicit body models are proposed based on human anthropometry, kinesiology, and human vision inspired criteria to detect and label significant body points. The key idea of the proposed method is to fit the knowledge from the implicit body models rather than fitting the predefined models in order to detect and label significant body points. The advantages of this method are that it does not require manual annotation, an explicit fitting procedure, and a training (learning) phase, and it is applicable to humans with different anthropometric proportions. The experimental results show that the proposed method robustly detects and labels significant body points in various activities of two different (low and high) resolution data sets. Furthermore, a Particle Filter with memory and feedback is proposed that combines temporal information of the previous observation and estimation with feedback to track significant body points in occlusion. In addition, in order to overcome the problem presented by the most occluded body part, i.e., the arm, a Motion Flow method is proposed. This method considers the human arm as a pendulum attached to the shoulder joint and defines conjectures to track the arm since it is the most occluded body part. The former method is invoked as default and the latter is used as per a user's choice. The experimental results show that the two proposed methods, i.e., Particle Filter and Motion Flow methods, robustly track significant body points in various activities of the above-mentioned two data sets and also enhance the performance of significant body point detection. A hierarchical relaxed partitioning system is then proposed that employs features extracted from the significant body points for activity recognition when multiple overlaps exist in the feature space. The working principle of the proposed method is based on the relaxed hierarchy (postpone uncertain decisions) and hierarchical strategy (group similar or confusing classes) while partitioning each class at different levels of the hierarchy. The advantages of the proposed method lie in its real-time speed, ease of implementation and extension, and non-intensive training. The experimental results show that it acquires valuable features and outperforms relevant state-of-the-art methods while comparable to other methods, i.e., the holistic and local feature approaches. In this context, the contribution of this thesis is three-fold: Pioneering a method for automated human body part detection and labelling. Developing methods for tracking human body parts in occlusion. Designing a method for robust and efficient human action recognition
    corecore