581 research outputs found
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
Anomaly Detection, Rule Adaptation and Rule Induction Methodologies in the Context of Automated Sports Video Annotation.
Automated video annotation is a topic of considerable interest in computer vision due to its applications in video search, object based video encoding and enhanced broadcast content. The domain of sport broadcasting is, in particular, the subject of current research attention due to its fixed, rule governed, content. This research work aims to develop, analyze and demonstrate novel methodologies that can be useful in the context of adaptive and automated video annotation systems. In this thesis, we present methodologies for addressing the problems of anomaly detection, rule adaptation and rule induction for court based sports such as tennis and badminton. We first introduce an HMM induction strategy for a court-model based method that uses the court structure in the form of a lattice for two related modalities of singles and doubles tennis to tackle the problems of anomaly detection and rectification. We also introduce another anomaly detection methodology that is based on the disparity between the low-level vision based classifiers and the high-level contextual classifier. Another approach to address the problem of rule adaptation is also proposed that employs Convex hulling of the anomalous states. We also investigate a number of novel hierarchical HMM generating methods for stochastic induction of game rules. These methodologies include, Cartesian product Label-based Hierarchical Bottom-up Clustering (CLHBC) that employs prior information within the label structures. A new constrained variant of the classical Chinese Restaurant Process (CRP) is also introduced that is relevant to sports games. We also propose two hybrid methodologies in this context and a comparative analysis is made against the flat Markov model. We also show that these methods are also generalizable to other rule based environments
Multilevel Chinese takeaway process and label-based processes for rule induction in the context of automated sports video annotation
We propose four variants of a novel hierarchical hidden Markov models strategy for rule induction in the context of automated sports video annotation including a multilevel Chinese takeaway process (MLCTP) based on the Chinese restaurant process and a novel Cartesian product label-based hierarchical bottom-up clustering (CLHBC) method that employs prior information contained within label structures. Our results show significant improvement by comparison against the flat Markov model: optimal performance is obtained using a hybrid method, which combines the MLCTP generated hierarchical topological structures with CLHBC generated event labels. We also show that the methods proposed are generalizable to other rule-based environments including human driving behavior and human actions
Multilevel Chinese takeaway process and label-based processes for rule induction in the context of automated sports video annotation
We propose four variants of a novel hierarchical hidden Markov models strategy for rule induction in the context of automated sports video annotation including a multilevel Chinese takeaway process (MLCTP) based on the Chinese restaurant process and a novel Cartesian product label-based hierarchical bottom-up clustering (CLHBC) method that employs prior information contained within label structures. Our results show significant improvement by comparison against the flat Markov model: optimal performance is obtained using a hybrid method, which combines the MLCTP generated hierarchical topological structures with CLHBC generated event labels. We also show that the methods proposed are generalizable to other rule-based environments including human driving behavior and human actions
Deep state-space modeling for explainable representation, analysis, and generation of professional human poses
The analysis of human movements has been extensively studied due to its wide
variety of practical applications. Nevertheless, the state-of-the-art still
faces scientific challenges while modeling human movements. Firstly, new models
that account for the stochasticity of human movement and the physical structure
of the human body are required to accurately predict the evolution of full-body
motion descriptors over time. Secondly, the explainability of existing deep
learning algorithms regarding their body posture predictions while generating
human movements still needs to be improved as they lack comprehensible
representations of human movement. This paper addresses these challenges by
introducing three novel approaches for creating explainable representations of
human movement. In this work, full-body movement is formulated as a state-space
model of a dynamic system whose parameters are estimated using deep learning
and statistical algorithms. The representations adhere to the structure of the
Gesture Operational Model (GOM), which describes movement through its spatial
and temporal assumptions. Two approaches correspond to deep state-space models
that apply nonlinear network parameterization to provide interpretable posture
predictions. The third method trains GOM representations using one-shot
training with Kalman Filters. This training strategy enables users to model
single movements and estimate their mathematical representation using
procedures that require less computational power than deep learning algorithms.
Ultimately, two applications of the generated representations are presented.
The first is for the accurate generation of human movements, and the second is
for body dexterity analysis of professional movements, where dynamic
associations between body joints and meaningful motion descriptors are
identified.Comment: Under revie
- …