1,159 research outputs found
Segmental Spatiotemporal CNNs for Fine-grained Action Segmentation
Joint segmentation and classification of fine-grained actions is important
for applications of human-robot interaction, video surveillance, and human
skill evaluation. However, despite substantial recent progress in large-scale
action classification, the performance of state-of-the-art fine-grained action
recognition approaches remains low. We propose a model for action segmentation
which combines low-level spatiotemporal features with a high-level segmental
classifier. Our spatiotemporal CNN is comprised of a spatial component that
uses convolutional filters to capture information about objects and their
relationships, and a temporal component that uses large 1D convolutional
filters to capture information about how object relationships change across
time. These features are used in tandem with a semi-Markov model that models
transitions from one action to another. We introduce an efficient constrained
segmental inference algorithm for this model that is orders of magnitude faster
than the current approach. We highlight the effectiveness of our Segmental
Spatiotemporal CNN on cooking and surgical action datasets for which we observe
substantially improved performance relative to recent baseline methods.Comment: Updated from the ECCV 2016 version. We fixed an important
mathematical error and made the section on segmental inference cleare
Pose sentences : a new representation for understanding human actions
Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2008.Thesis (Master's) -- Bilkent University, 2008.Includes bibliographical references leaves 55-58.In this thesis we address the problem of human action recognition from video sequences.
Our main contribution to the literature is the compact use of poses while
representing videos and most importantly considering actions as pose-sentences
and exploit string matching approaches for classification. We focus on single actions,
where the actor performs one simple action through the video sequence. We
represent actions as documents consisting of words, where a word refers to a pose
in a frame. We think pose information is a powerful source for describing actions.
In search of a robust pose descriptor, we make use of four well-known techniques
to extract pose information, Histogram of Oriented Gradients, k-Adjacent Segments,
Shape Context and Optical Flow Histograms. To represent actions, first
we generate a codebook which will act as a dictionary for our action dataset.
Action sequences are then represented using a sequence of pose-words, as posesentences.
The similarity between two actions are obtained using string matching
techniques. We also apply a bag-of-poses approach for comparison purposes and
show the superiority of pose-sentences. We test the efficiency of our method with
two widely used benchmark datasets, Weizmann and KTH. We show that pose is
indeed very descriptive while representing actions, and without having to examine
complex dynamic characteristics of actions, one can apply simple techniques
with equally successful results.Hatun, KardelenM.S
Gesture Recognition in Robotic Surgery: a Review
OBJECTIVE: Surgical activity recognition is a fundamental step in computer-assisted interventions. This paper reviews the state-of-the-art in methods for automatic recognition of fine-grained gestures in robotic surgery focusing on recent data-driven approaches and outlines the open questions and future research directions. METHODS: An article search was performed on 5 bibliographic databases with combinations of the following search terms: robotic, robot-assisted, JIGSAWS, surgery, surgical, gesture, fine-grained, surgeme, action, trajectory, segmentation, recognition, parsing. Selected articles were classified based on the level of supervision required for training and divided into different groups representing major frameworks for time series analysis and data modelling. RESULTS: A total of 52 articles were reviewed. The research field is showing rapid expansion, with the majority of articles published in the last 4 years. Deep-learning-based temporal models with discriminative feature extraction and multi-modal data integration have demonstrated promising results on small surgical datasets. Currently, unsupervised methods perform significantly less well than the supervised approaches. CONCLUSION: The development of large and diverse open-source datasets of annotated demonstrations is essential for development and validation of robust solutions for surgical gesture recognition. While new strategies for discriminative feature extraction and knowledge transfer, or unsupervised and semi-supervised approaches, can mitigate the need for data and labels, they have not yet been demonstrated to achieve comparable performance. Important future research directions include detection and forecast of gesture-specific errors and anomalies. SIGNIFICANCE: This paper is a comprehensive and structured analysis of surgical gesture recognition methods aiming to summarize the status of this rapidly evolving field
- …