9,191 research outputs found

    Compound memory networks for few-shot video classification

    Full text link
    © Springer Nature Switzerland AG 2018. In this paper, we propose a new memory network structure for few-shot video classification by making the following contributions. First, we propose a compound memory network (CMN) structure under the key-value memory network paradigm, in which each key memory involves multiple constituent keys. These constituent keys work collaboratively for training, which enables the CMN to obtain an optimal video representation in a larger space. Second, we introduce a multi-saliency embedding algorithm which encodes a variable-length video sequence into a fixed-size matrix representation by discovering multiple saliencies of interest. For example, given a video of car auction, some people are interested in the car, while others are interested in the auction activities. Third, we design an abstract memory on top of the constituent keys. The abstract memory and constituent keys form a layered structure, which makes the CMN more efficient and capable of being scaled, while also retaining the representation capability of the multiple keys. We compare CMN with several state-of-the-art baselines on a new few-shot video classification dataset and show the effectiveness of our approach

    Label Independent Memory for Semi-Supervised Few-shot Video Classification.

    Full text link
    In this paper, we propose to leverage freely available unlabeled video data to facilitate few-shot video classification. In this semi-supervised few-shot video classification task, millions of unlabeled data are available for each episode during training. These videos can be extremely imbalanced, while they have profound visual and motion dynamics. To tackle the semi-supervised few-shot video classification problem, we make the following contributions. First, we propose a label independent memory (LIM) to cache label related features, which enables a similarity search over a large set of videos. LIM produces a class prototype for few-shot training. This prototype is an aggregated embedding for each class, which is more robust to noisy video features. Second, we integrate a multi-modality compound memory network to capture both RGB and flow information. We propose to store the RGB and flow representation in two separate memory networks, but they are jointly optimized via a unified loss. In this way, mutual communications between the two modalities are leveraged to achieve better classification performance. Third, we conduct extensive experiments on the few-shot Kinetics-100, Something-Something-100 datasets, which validates the effectiveness of leveraging the accessible unlabeled data for few-shot classification

    Compound Prototype Matching for Few-shot Action Recognition

    Full text link
    Few-shot action recognition aims to recognize novel action classes using only a small number of labeled training samples. In this work, we propose a novel approach that first summarizes each video into compound prototypes consisting of a group of global prototypes and a group of focused prototypes, and then compares video similarity based on the prototypes. Each global prototype is encouraged to summarize a specific aspect from the entire video, for example, the start/evolution of the action. Since no clear annotation is provided for the global prototypes, we use a group of focused prototypes to focus on certain timestamps in the video. We compare video similarity by matching the compound prototypes between the support and query videos. The global prototypes are directly matched to compare videos from the same perspective, for example, to compare whether two actions start similarly. For the focused prototypes, since actions have various temporal variations in the videos, we apply bipartite matching to allow the comparison of actions with different temporal positions and shifts. Experiments demonstrate that our proposed method achieves state-of-the-art results on multiple benchmarks.Comment: ECCV 202
    • …
    corecore