946 research outputs found

    Efficient duration and hierarchical modeling for human activity recognition

    Get PDF
    A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies.The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperforms a range of baseline models for the task of activity segmentation. We also achieve arecognition accuracy competitive to the current state-of-the-art multinomial duration model, while gaining a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small Kis required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling

    Efficient duration modelling in the hierarchical hidden semi-Markov models and their applications

    Get PDF
    Modeling patterns in temporal data has arisen as an important problem in engineering and science. This has led to the popularity of several dynamic models, in particular the renowned hidden Markov model (HMM) [Rabiner, 1989]. Despite its widespread success in many cases, the standard HMM often fails to model more complex data whose elements are correlated hierarchically or over a long period. Such problems are, however, frequently encountered in practice. Existing efforts to overcome this weakness often address either one of these two aspects separately, mainly due to computational intractability. Motivated by this modeling challenge in many real world problems, in particular, for video surveillance and segmentation, this thesis aims to develop tractable probabilistic models that can jointly model duration and hierarchical information in a unified framework. We believe that jointly exploiting statistical strength from both properties will lead to more accurate and robust models for the needed task. To tackle the modeling aspect, we base our work on an intersection between dynamic graphical models and statistics of lifetime modeling. Realizing that the key bottleneck found in the existing works lies in the choice of the distribution for a state, we have successfully integrated the discrete Coxian distribution [Cox, 1955], a special class of phase-type distributions, into the HMM to form a novel and powerful stochastic model termed as the Coxian Hidden Semi-Markov Model (CxHSMM). We show that this model can still be expressed as a dynamic Bayesian network, and inference and learning can be derived analytically.Most importantly, it has four superior features over existing semi-Markov modelling: the parameter space is compact, computation is fast (almost the same as the HMM), close-formed estimation can be derived, and the Coxian is flexible enough to approximate a large class of distributions. Next, we exploit hierarchical decomposition in the data by borrowing analogy from the hierarchical hidden Markov model in [Fine et al., 1998, Bui et al., 2004] and introduce a new type of shallow structured graphical model that combines both duration and hierarchical modelling into a unified framework, termed the Coxian Switching Hidden Semi-Markov Models (CxSHSMM). The top layer is a Markov sequence of switching variables, while the bottom layer is a sequence of concatenated CxHSMMs whose parameters are determined by the switching variable at the top. Again, we provide a thorough analysis along with inference and learning machinery. We also show that semi-Markov models with arbitrary depth structure can easily be developed. In all cases we further address two practical issues: missing observations to unstable tracking and the use of partially labelled data to improve training accuracy. Motivated by real-world problems, our application contribution is a framework to recognize complex activities of daily livings (ADLs) and detect anomalies to provide better intelligent caring services for the elderly.Coarser activities with self duration distributions are represented using the CxHSMM. Complex activities are made of a sequence of coarser activities and represented at the top level in the CxSHSMM. Intensive experiments are conducted to evaluate our solutions against existing methods. In many cases, the superiority of the joint modeling and the Coxian parameterization over traditional methods is confirmed. The robustness of our proposed models is further demonstrated in a series of more challenging experiments, in which the tracking is often lost and activities considerably overlap. Our final contribution is an application of the switching Coxian model to segment education-oriented videos into coherent topical units. Our results again demonstrate such segmentation processes can benefit greatly from the joint modeling of duration and hierarchy

    Multi-Modal Models for Fine-grained Action Segmentation in Situated Environments

    Get PDF
    Automated methods for analyzing human activities from video or sensor data are critical for enabling new applications in human-robot interaction, surgical data modeling, video summarization, and beyond. Despite decades of research in the fields of robotics and computer vision, current approaches are inadequate for modeling complex activities outside of constrained environments or without using heavily instrumented sensor suites. In this dissertation, I address the problem of fine-grained action segmentation by developing solutions that generalize from domain-specific to general-purpose for applications in surgical workflow, surveillance, and cooking. A key technical challenge, which is central to this dissertation, is how to capture complex temporal patterns from sensor data. For a given task, users may perform the same action at different speeds or styles, and each user may carry out actions in a different order. I present a series of temporal models that address these modes of variability. First, I define the notion of a convolutional action primitive, which captures how low-level sensor signals change as a function of the action a user is performing. Second, I generalize this idea to video with a Spatiotemporal Convolutional Neural Network, which captures relationships between objects in an image and how they change temporally. Lastly, I discuss a hierarchical variant that applies to video or sensor data, called a Temporal Convolutional Network (TCN), which models actions at multiple temporal scales. In certain domains (e.g., surgical training), TCNs can be used to successfully bridge the gap in performance between domain-specific and general-purpose solutions. A key scientific challenge concerns the evaluation of predicted action segmentations. In many applications, action labels may be ill-defined and if one asks two different annotators when a given action starts and stops they may give answers that are seconds apart. I argue that the standard action segmentation metrics are insufficient for evaluating real-world segmentation performance and propose two alternatives. Qualitatively, these metrics are better at capturing the efficacy of models in the described applications. I conclude with a case-study on surgical workflow analysis, which has the potential to improve surgical education and operating room efficiency. Current work almost exclusively relies on extensive instrumentation, which is difficult and costly to acquire. I show that our spatiotemporal video models are capable of capturing important surgical attributes (e.g., organs, tools) and achieve state-of-the-art performance on two challenging datasets. The models and methodology described have demonstrably improved the ability to temporally segment complex human activities, in many cases without sophisticated instrumentation

    Recognizing Teamwork Activity In Observations Of Embodied Agents

    Get PDF
    This thesis presents contributions to the theory and practice of team activity recognition. A particular focus of our work was to improve our ability to collect and label representative samples, thus making the team activity recognition more efficient. A second focus of our work is improving the robustness of the recognition process in the presence of noisy and distorted data. The main contributions of this thesis are as follows: We developed a software tool, the Teamwork Scenario Editor (TSE), for the acquisition, segmentation and labeling of teamwork data. Using the TSE we acquired a corpus of labeled team actions both from synthetic and real world sources. We developed an approach through which representations of idealized team actions can be acquired in form of Hidden Markov Models which are trained using a small set of representative examples segmented and labeled with the TSE. We developed set of team-oriented feature functions, which extract discrete features from the high-dimensional continuous data. The features were chosen such that they mimic the features used by humans when recognizing teamwork actions. We developed a technique to recognize the likely roles played by agents in teams even before the team action was recognized. Through experimental studies we show that the feature functions and role recognition module significantly increase the recognition accuracy, while allowing arbitrary shuffled inputs and noisy data

    Predefined pattern detection in large time series

    Get PDF
    Predefined pattern detection from time series is an interesting and challenging task. In order to reduce its computational cost and increase effectiveness, a number of time series representation methods and similarity measures have been proposed. Most of the existing methods focus on full sequence matching, that is, sequences with clearly defined beginnings and endings, where all data points contribute to the match. These methods, however, do not account for temporal and magnitude deformations in the data and result to be ineffective on several real-world scenarios where noise and external phenomena introduce diversity in the class of patterns to be matched. In this paper, we present a novel pattern detection method, which is based on the notions of templates, landmarks, constraints and trust regions. We employ the Minimum Description Length (MDL) principle for time series preprocessing step, which helps to preserve all the prominent features and prevents the template from overfitting. Templates are provided by common users or domain experts, and represent interesting patterns we want to detect from time series. Instead of utilising templates to match all the potential subsequences in the time series, we translate the time series and templates into landmark sequences, and detect patterns from landmark sequence of the time series. Through defining constraints within the template landmark sequence, we effectively extract all the landmark subsequences from the time series landmark sequence, and obtain a number of landmark segments (time series subsequences or instances). We model each landmark segment through scaling the template in both temporal and magnitude dimensions. To suppress the influence of noise, we introduce the concept oftrust region, which not only helps to achieve an improved instance model, but also helps to catch the accurate boundaries of instances of the given template. Based on the similarities derived from instance models, we introduce the probability density function to calculate a similarity threshold. The threshold can be used to judge if a landmark segment is a true instance of the given template or not. To evaluate the effectiveness and efficiency of the proposed method, we apply it to two real-world datasets. The results show that our method is capable of detecting patterns of temporal and magnitude deformations with competitive performance
    corecore