25,808 research outputs found
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
We present a new method for synthesizing high-resolution photo-realistic
images from semantic label maps using conditional generative adversarial
networks (conditional GANs). Conditional GANs have enabled a variety of
applications, but the results are often limited to low-resolution and still far
from realistic. In this work, we generate 2048x1024 visually appealing results
with a novel adversarial loss, as well as new multi-scale generator and
discriminator architectures. Furthermore, we extend our framework to
interactive visual manipulation with two additional features. First, we
incorporate object instance segmentation information, which enables object
manipulations such as removing/adding objects and changing the object category.
Second, we propose a method to generate diverse results given the same input,
allowing users to edit the object appearance interactively. Human opinion
studies demonstrate that our method significantly outperforms existing methods,
advancing both the quality and the resolution of deep image synthesis and
editing.Comment: v2: CVPR camera ready, adding more results for edge-to-photo example
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Integrated Face Analytics Networks through Cross-Dataset Hybrid Training
Face analytics benefits many multimedia applications. It consists of a number
of tasks, such as facial emotion recognition and face parsing, and most
existing approaches generally treat these tasks independently, which limits
their deployment in real scenarios. In this paper we propose an integrated Face
Analytics Network (iFAN), which is able to perform multiple tasks jointly for
face analytics with a novel carefully designed network architecture to fully
facilitate the informative interaction among different tasks. The proposed
integrated network explicitly models the interactions between tasks so that the
correlations between tasks can be fully exploited for performance boost. In
addition, to solve the bottleneck of the absence of datasets with comprehensive
training data for various tasks, we propose a novel cross-dataset hybrid
training strategy. It allows "plug-in and play" of multiple datasets annotated
for different tasks without the requirement of a fully labeled common dataset
for all the tasks. We experimentally show that the proposed iFAN achieves
state-of-the-art performance on multiple face analytics tasks using a single
integrated model. Specifically, iFAN achieves an overall F-score of 91.15% on
the Helen dataset for face parsing, a normalized mean error of 5.81% on the
MTFL dataset for facial landmark localization and an accuracy of 45.73% on the
BNU dataset for emotion recognition with a single model.Comment: 10 page
- …