42,197 research outputs found
Compact Bilinear Pooling
Bilinear models has been shown to achieve impressive performance on a wide
range of visual tasks, such as semantic segmentation, fine grained recognition
and face recognition. However, bilinear features are high dimensional,
typically on the order of hundreds of thousands to a few million, which makes
them impractical for subsequent analysis. We propose two compact bilinear
representations with the same discriminative power as the full bilinear
representation but with only a few thousand dimensions. Our compact
representations allow back-propagation of classification errors enabling an
end-to-end optimization of the visual recognition system. The compact bilinear
representations are derived through a novel kernelized analysis of bilinear
pooling which provide insights into the discriminative power of bilinear
pooling, and a platform for further research in compact pooling methods.
Experimentation illustrate the utility of the proposed representations for
image classification and few-shot learning across several datasets.Comment: Camera ready version for CVP
Dynamic Face Video Segmentation via Reinforcement Learning
For real-time semantic video segmentation, most recent works utilised a
dynamic framework with a key scheduler to make online key/non-key decisions.
Some works used a fixed key scheduling policy, while others proposed adaptive
key scheduling methods based on heuristic strategies, both of which may lead to
suboptimal global performance. To overcome this limitation, we model the online
key decision process in dynamic video segmentation as a deep reinforcement
learning problem and learn an efficient and effective scheduling policy from
expert information about decision history and from the process of maximising
global return. Moreover, we study the application of dynamic video segmentation
on face videos, a field that has not been investigated before. By evaluating on
the 300VW dataset, we show that the performance of our reinforcement key
scheduler outperforms that of various baselines in terms of both effective key
selections and running speed. Further results on the Cityscapes dataset
demonstrate that our proposed method can also generalise to other scenarios. To
the best of our knowledge, this is the first work to use reinforcement learning
for online key-frame decision in dynamic video segmentation, and also the first
work on its application on face videos.Comment: CVPR 2020. 300VW with segmentation labels is available at:
https://github.com/mapleandfire/300VW-Mas
Error Correction for Dense Semantic Image Labeling
Pixelwise semantic image labeling is an important, yet challenging, task with
many applications. Typical approaches to tackle this problem involve either the
training of deep networks on vast amounts of images to directly infer the
labels or the use of probabilistic graphical models to jointly model the
dependencies of the input (i.e. images) and output (i.e. labels). Yet, the
former approaches do not capture the structure of the output labels, which is
crucial for the performance of dense labeling, and the latter rely on carefully
hand-designed priors that require costly parameter tuning via optimization
techniques, which in turn leads to long inference times. To alleviate these
restrictions, we explore how to arrive at dense semantic pixel labels given
both the input image and an initial estimate of the output labels. We propose a
parallel architecture that: 1) exploits the context information through a
LabelPropagation network to propagate correct labels from nearby pixels to
improve the object boundaries, 2) uses a LabelReplacement network to directly
replace possibly erroneous, initial labels with new ones, and 3) combines the
different intermediate results via a Fusion network to obtain the final
per-pixel label. We experimentally validate our approach on two different
datasets for the semantic segmentation and face parsing tasks respectively,
where we show improvements over the state-of-the-art. We also provide both a
quantitative and qualitative analysis of the generated results
- …