21,737 research outputs found

    Generalized Many-Way Few-Shot Video Classification

    Get PDF
    Few-shot learning methods operate in low data regimes. The aim is to learn with few training examples per class. Although significant progress has been made in few-shot image classification, few-shot video recognition is relatively unexplored and methods based on 2D CNNs are unable to learn temporal information. In this work we thus develop a simple 3D CNN baseline, surpassing existing methods by a large margin. To circumvent the need of labeled examples, we propose to leverage weakly-labeled videos from a large dataset using tag retrieval followed by selecting the best clips with visual similarities, yielding further improvement. Our results saturate current 5-way benchmarks for few-shot video classification and therefore we propose a new challenging benchmark involving more classes and a mixture of classes with varying supervision

    Multi-Label Zero-Shot Human Action Recognition via Joint Latent Ranking Embedding

    Get PDF
    Human action recognition refers to automatic recognizing human actions from a video clip. In reality, there often exist multiple human actions in a video stream. Such a video stream is often weakly-annotated with a set of relevant human action labels at a global level rather than assigning each label to a specific video episode corresponding to a single action, which leads to a multi-label learning problem. Furthermore, there are many meaningful human actions in reality but it would be extremely difficult to collect/annotate video clips regarding all of various human actions, which leads to a zero-shot learning scenario. To the best of our knowledge, there is no work that has addressed all the above issues together in human action recognition. In this paper, we formulate a real-world human action recognition task as a multi-label zero-shot learning problem and propose a framework to tackle this problem in a holistic way. Our framework holistically tackles the issue of unknown temporal boundaries between different actions for multi-label learning and exploits the side information regarding the semantic relationship between different human actions for knowledge transfer. Consequently, our framework leads to a joint latent ranking embedding for multi-label zero-shot human action recognition. A novel neural architecture of two component models and an alternate learning algorithm are proposed to carry out the joint latent ranking embedding learning. Thus, multi-label zero-shot recognition is done by measuring relatedness scores of action labels to a test video clip in the joint latent visual and semantic embedding spaces. We evaluate our framework with different settings, including a novel data split scheme designed especially for evaluating multi-label zero-shot learning, on two datasets: Breakfast and Charades. The experimental results demonstrate the effectiveness of our framework.Comment: 27 pages, 10 figures and 7 tables. Technical report submitted to a journal. More experimental results/references were added and typos were correcte

    Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

    Get PDF
    This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition. Specifically, it categorises the cross-dataset recognition into seventeen problems based on a set of carefully chosen data and label attributes. Such a problem-oriented taxonomy has allowed us to examine how different transfer learning approaches tackle each problem and how well each problem has been researched to date. The comprehensive problem-oriented review of the advances in transfer learning with respect to the problem has not only revealed the challenges in transfer learning for visual recognition, but also the problems (e.g. eight of the seventeen problems) that have been scarcely studied. This survey not only presents an up-to-date technical review for researchers, but also a systematic approach and a reference for a machine learning practitioner to categorise a real problem and to look up for a possible solution accordingly
    • …
    corecore