2,148 research outputs found
Sketch-a-Net that Beats Humans
We propose a multi-scale multi-channel deep neural network framework that,
for the first time, yields sketch recognition performance surpassing that of
humans. Our superior performance is a result of explicitly embedding the unique
characteristics of sketches in our model: (i) a network architecture designed
for sketch rather than natural photo statistics, (ii) a multi-channel
generalisation that encodes sequential ordering in the sketching process, and
(iii) a multi-scale network ensemble with joint Bayesian fusion that accounts
for the different levels of abstraction exhibited in free-hand sketches. We
show that state-of-the-art deep networks specifically engineered for photos of
natural objects fail to perform well on sketch recognition, regardless whether
they are trained using photo or sketch. Our network on the other hand not only
delivers the best performance on the largest human sketch dataset to date, but
also is small in size making efficient training possible using just CPUs.Comment: Accepted to BMVC 2015 (oral
Deep Sketch-Photo Face Recognition Assisted by Facial Attributes
In this paper, we present a deep coupled framework to address the problem of
matching sketch image against a gallery of mugshots. Face sketches have the
essential in- formation about the spatial topology and geometric details of
faces while missing some important facial attributes such as ethnicity, hair,
eye, and skin color. We propose a cou- pled deep neural network architecture
which utilizes facial attributes in order to improve the sketch-photo
recognition performance. The proposed Attribute-Assisted Deep Con- volutional
Neural Network (AADCNN) method exploits the facial attributes and leverages the
loss functions from the facial attributes identification and face verification
tasks in order to learn rich discriminative features in a common em- bedding
subspace. The facial attribute identification task increases the inter-personal
variations by pushing apart the embedded features extracted from individuals
with differ- ent facial attributes, while the verification task reduces the
intra-personal variations by pulling together all the fea- tures that are
related to one person. The learned discrim- inative features can be well
generalized to new identities not seen in the training data. The proposed
architecture is able to make full use of the sketch and complementary fa- cial
attribute information to train a deep model compared to the conventional
sketch-photo recognition methods. Exten- sive experiments are performed on
composite (E-PRIP) and semi-forensic (IIIT-D semi-forensic) datasets. The
results show the superiority of our method compared to the state- of-the-art
models in sketch-photo recognition algorithm
Deep Shape Matching
We cast shape matching as metric learning with convolutional networks. We
break the end-to-end process of image representation into two parts. Firstly,
well established efficient methods are chosen to turn the images into edge
maps. Secondly, the network is trained with edge maps of landmark images, which
are automatically obtained by a structure-from-motion pipeline. The learned
representation is evaluated on a range of different tasks, providing
improvements on challenging cases of domain generalization, generic
sketch-based image retrieval or its fine-grained counterpart. In contrast to
other methods that learn a different model per task, object category, or
domain, we use the same network throughout all our experiments, achieving
state-of-the-art results in multiple benchmarks.Comment: ECCV 201
End-to-End Photo-Sketch Generation via Fully Convolutional Representation Learning
Sketch-based face recognition is an interesting task in vision and multimedia
research, yet it is quite challenging due to the great difference between face
photos and sketches. In this paper, we propose a novel approach for
photo-sketch generation, aiming to automatically transform face photos into
detail-preserving personal sketches. Unlike the traditional models synthesizing
sketches based on a dictionary of exemplars, we develop a fully convolutional
network to learn the end-to-end photo-sketch mapping. Our approach takes whole
face photos as inputs and directly generates the corresponding sketch images
with efficient inference and learning, in which the architecture are stacked by
only convolutional kernels of very small sizes. To well capture the person
identity during the photo-sketch transformation, we define our optimization
objective in the form of joint generative-discriminative minimization. In
particular, a discriminative regularization term is incorporated into the
photo-sketch generation, enhancing the discriminability of the generated person
sketches against other individuals. Extensive experiments on several standard
benchmarks suggest that our approach outperforms other state-of-the-art methods
in both photo-sketch generation and face sketch verification.Comment: 8 pages, 6 figures. Proceeding in ACM International Conference on
Multimedia Retrieval (ICMR), 201
SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis
Synthesizing realistic images from human drawn sketches is a challenging
problem in computer graphics and vision. Existing approaches either need exact
edge maps, or rely on retrieval of existing photographs. In this work, we
propose a novel Generative Adversarial Network (GAN) approach that synthesizes
plausible images from 50 categories including motorcycles, horses and couches.
We demonstrate a data augmentation technique for sketches which is fully
automatic, and we show that the augmented data is helpful to our task. We
introduce a new network building block suitable for both the generator and
discriminator which improves the information flow by injecting the input image
at multiple scales. Compared to state-of-the-art image translation methods, our
approach generates more realistic images and achieves significantly higher
Inception Scores.Comment: Accepted to CVPR 201
Image-to-Image Translation with Conditional Adversarial Networks
We investigate conditional adversarial networks as a general-purpose solution
to image-to-image translation problems. These networks not only learn the
mapping from input image to output image, but also learn a loss function to
train this mapping. This makes it possible to apply the same generic approach
to problems that traditionally would require very different loss formulations.
We demonstrate that this approach is effective at synthesizing photos from
label maps, reconstructing objects from edge maps, and colorizing images, among
other tasks. Indeed, since the release of the pix2pix software associated with
this paper, a large number of internet users (many of them artists) have posted
their own experiments with our system, further demonstrating its wide
applicability and ease of adoption without the need for parameter tweaking. As
a community, we no longer hand-engineer our mapping functions, and this work
suggests we can achieve reasonable results without hand-engineering our loss
functions either.Comment: Website: https://phillipi.github.io/pix2pix/, CVPR 201
Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation
While representation learning aims to derive interpretable features for
describing visual data, representation disentanglement further results in such
features so that particular image attributes can be identified and manipulated.
However, one cannot easily address this task without observing ground truth
annotation for the training data. To address this problem, we propose a novel
deep learning model of Cross-Domain Representation Disentangler (CDRD). By
observing fully annotated source-domain data and unlabeled target-domain data
of interest, our model bridges the information across data domains and
transfers the attribute information accordingly. Thus, cross-domain joint
feature disentanglement and adaptation can be jointly performed. In the
experiments, we provide qualitative results to verify our disentanglement
capability. Moreover, we further confirm that our model can be applied for
solving classification tasks of unsupervised domain adaptation, and performs
favorably against state-of-the-art image disentanglement and translation
methods.Comment: CVPR 2018 Spotligh
- …