2,172 research outputs found
The Devil is in the Decoder: Classification, Regression and GANs
Many machine vision applications, such as semantic segmentation and depth
prediction, require predictions for every pixel of the input image. Models for
such problems usually consist of encoders which decrease spatial resolution
while learning a high-dimensional representation, followed by decoders who
recover the original input resolution and result in low-dimensional
predictions. While encoders have been studied rigorously, relatively few
studies address the decoder side. This paper presents an extensive comparison
of a variety of decoders for a variety of pixel-wise tasks ranging from
classification, regression to synthesis. Our contributions are: (1) Decoders
matter: we observe significant variance in results between different types of
decoders on various problems. (2) We introduce new residual-like connections
for decoders. (3) We introduce a novel decoder: bilinear additive upsampling.
(4) We explore prediction artifacts
Integrated Face Analytics Networks through Cross-Dataset Hybrid Training
Face analytics benefits many multimedia applications. It consists of a number
of tasks, such as facial emotion recognition and face parsing, and most
existing approaches generally treat these tasks independently, which limits
their deployment in real scenarios. In this paper we propose an integrated Face
Analytics Network (iFAN), which is able to perform multiple tasks jointly for
face analytics with a novel carefully designed network architecture to fully
facilitate the informative interaction among different tasks. The proposed
integrated network explicitly models the interactions between tasks so that the
correlations between tasks can be fully exploited for performance boost. In
addition, to solve the bottleneck of the absence of datasets with comprehensive
training data for various tasks, we propose a novel cross-dataset hybrid
training strategy. It allows "plug-in and play" of multiple datasets annotated
for different tasks without the requirement of a fully labeled common dataset
for all the tasks. We experimentally show that the proposed iFAN achieves
state-of-the-art performance on multiple face analytics tasks using a single
integrated model. Specifically, iFAN achieves an overall F-score of 91.15% on
the Helen dataset for face parsing, a normalized mean error of 5.81% on the
MTFL dataset for facial landmark localization and an accuracy of 45.73% on the
BNU dataset for emotion recognition with a single model.Comment: 10 page
- …