25,383 research outputs found
Compare More Nuanced:Pairwise Alignment Bilinear Network For Few-shot Fine-grained Learning
The recognition ability of human beings is developed in a progressive way.
Usually, children learn to discriminate various objects from coarse to
fine-grained with limited supervision. Inspired by this learning process, we
propose a simple yet effective model for the Few-Shot Fine-Grained (FSFG)
recognition, which tries to tackle the challenging fine-grained recognition
task using meta-learning. The proposed method, named Pairwise Alignment
Bilinear Network (PABN), is an end-to-end deep neural network. Unlike
traditional deep bilinear networks for fine-grained classification, which adopt
the self-bilinear pooling to capture the subtle features of images, the
proposed model uses a novel pairwise bilinear pooling to compare the nuanced
differences between base images and query images for learning a deep distance
metric. In order to match base image features with query image features, we
design feature alignment losses before the proposed pairwise bilinear pooling.
Experiment results on four fine-grained classification datasets and one generic
few-shot dataset demonstrate that the proposed model outperforms both the
state-ofthe-art few-shot fine-grained and general few-shot methods.Comment: ICME 2019 Ora
Compact Bilinear Pooling
Bilinear models has been shown to achieve impressive performance on a wide
range of visual tasks, such as semantic segmentation, fine grained recognition
and face recognition. However, bilinear features are high dimensional,
typically on the order of hundreds of thousands to a few million, which makes
them impractical for subsequent analysis. We propose two compact bilinear
representations with the same discriminative power as the full bilinear
representation but with only a few thousand dimensions. Our compact
representations allow back-propagation of classification errors enabling an
end-to-end optimization of the visual recognition system. The compact bilinear
representations are derived through a novel kernelized analysis of bilinear
pooling which provide insights into the discriminative power of bilinear
pooling, and a platform for further research in compact pooling methods.
Experimentation illustrate the utility of the proposed representations for
image classification and few-shot learning across several datasets.Comment: Camera ready version for CVP
Hierarchical Attention Network for Action Segmentation
The temporal segmentation of events is an essential task and a precursor for
the automatic recognition of human actions in the video. Several attempts have
been made to capture frame-level salient aspects through attention but they
lack the capacity to effectively map the temporal relationships in between the
frames as they only capture a limited span of temporal dependencies. To this
end we propose a complete end-to-end supervised learning approach that can
better learn relationships between actions over time, thus improving the
overall segmentation performance. The proposed hierarchical recurrent attention
framework analyses the input video at multiple temporal scales, to form
embeddings at frame level and segment level, and perform fine-grained action
segmentation. This generates a simple, lightweight, yet extremely effective
architecture for segmenting continuous video streams and has multiple
application domains. We evaluate our system on multiple challenging public
benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech
Egocentric datasets, and achieves state-of-the-art performance. The evaluated
datasets encompass numerous video capture settings which are inclusive of
static overhead camera views and dynamic, ego-centric head-mounted camera
views, demonstrating the direct applicability of the proposed framework in a
variety of settings.Comment: Published in Pattern Recognition Letter
- …