928 research outputs found
Hand Hygiene Assessment via Joint Step Segmentation and Key Action Scorer
Hand hygiene is a standard six-step hand-washing action proposed by the World
Health Organization (WHO). However, there is no good way to supervise medical
staff to do hand hygiene, which brings the potential risk of disease spread.
Existing action assessment works usually make an overall quality prediction on
an entire video. However, the internal structures of hand hygiene action are
important in hand hygiene assessment. Therefore, we propose a novel
fine-grained learning framework to perform step segmentation and key action
scorer in a joint manner for accurate hand hygiene assessment. Existing
temporal segmentation methods usually employ multi-stage convolutional network
to improve the segmentation robustness, but easily lead to over-segmentation
due to the lack of the long-range dependence. To address this issue, we design
a multi-stage convolution-transformer network for step segmentation. Based on
the observation that each hand-washing step involves several key actions which
determine the hand-washing quality, we design a set of key action scorers to
evaluate the quality of key actions in each step. In addition, there lacks a
unified dataset in hand hygiene assessment. Therefore, under the supervision of
medical staff, we contribute a video dataset that contains 300 video sequences
with fine-grained annotations. Extensive experiments on the dataset suggest
that our method well assesses hand hygiene videos and achieves outstanding
performance
Segmental Spatiotemporal CNNs for Fine-grained Action Segmentation
Joint segmentation and classification of fine-grained actions is important
for applications of human-robot interaction, video surveillance, and human
skill evaluation. However, despite substantial recent progress in large-scale
action classification, the performance of state-of-the-art fine-grained action
recognition approaches remains low. We propose a model for action segmentation
which combines low-level spatiotemporal features with a high-level segmental
classifier. Our spatiotemporal CNN is comprised of a spatial component that
uses convolutional filters to capture information about objects and their
relationships, and a temporal component that uses large 1D convolutional
filters to capture information about how object relationships change across
time. These features are used in tandem with a semi-Markov model that models
transitions from one action to another. We introduce an efficient constrained
segmental inference algorithm for this model that is orders of magnitude faster
than the current approach. We highlight the effectiveness of our Segmental
Spatiotemporal CNN on cooking and surgical action datasets for which we observe
substantially improved performance relative to recent baseline methods.Comment: Updated from the ECCV 2016 version. We fixed an important
mathematical error and made the section on segmental inference cleare
Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition
Long short-term memory (LSTM) is normally used in recurrent neural network
(RNN) as basic recurrent unit. However,conventional LSTM assumes that the state
at current time step depends on previous time step. This assumption constraints
the time dependency modeling capability. In this study, we propose a new
variation of LSTM, advanced LSTM (A-LSTM), for better temporal context
modeling. We employ A-LSTM in weighted pooling RNN for emotion recognition. The
A-LSTM outperforms the conventional LSTM by 5.5% relatively. The A-LSTM based
weighted pooling RNN can also complement the state-of-the-art emotion
classification framework. This shows the advantage of A-LSTM
- …