58,342 research outputs found
Multi-Label Zero-Shot Human Action Recognition via Joint Latent Ranking Embedding
Human action recognition refers to automatic recognizing human actions from a
video clip. In reality, there often exist multiple human actions in a video
stream. Such a video stream is often weakly-annotated with a set of relevant
human action labels at a global level rather than assigning each label to a
specific video episode corresponding to a single action, which leads to a
multi-label learning problem. Furthermore, there are many meaningful human
actions in reality but it would be extremely difficult to collect/annotate
video clips regarding all of various human actions, which leads to a zero-shot
learning scenario. To the best of our knowledge, there is no work that has
addressed all the above issues together in human action recognition. In this
paper, we formulate a real-world human action recognition task as a multi-label
zero-shot learning problem and propose a framework to tackle this problem in a
holistic way. Our framework holistically tackles the issue of unknown temporal
boundaries between different actions for multi-label learning and exploits the
side information regarding the semantic relationship between different human
actions for knowledge transfer. Consequently, our framework leads to a joint
latent ranking embedding for multi-label zero-shot human action recognition. A
novel neural architecture of two component models and an alternate learning
algorithm are proposed to carry out the joint latent ranking embedding
learning. Thus, multi-label zero-shot recognition is done by measuring
relatedness scores of action labels to a test video clip in the joint latent
visual and semantic embedding spaces. We evaluate our framework with different
settings, including a novel data split scheme designed especially for
evaluating multi-label zero-shot learning, on two datasets: Breakfast and
Charades. The experimental results demonstrate the effectiveness of our
framework.Comment: 27 pages, 10 figures and 7 tables. Technical report submitted to a
journal. More experimental results/references were added and typos were
correcte
- …