20,375 research outputs found
A Survey of Deep Facial Attribute Analysis
Facial attribute analysis has received considerable attention when deep
learning techniques made remarkable breakthroughs in this field over the past
few years. Deep learning based facial attribute analysis consists of two basic
sub-issues: facial attribute estimation (FAE), which recognizes whether facial
attributes are present in given images, and facial attribute manipulation
(FAM), which synthesizes or removes desired facial attributes. In this paper,
we provide a comprehensive survey of deep facial attribute analysis from the
perspectives of both estimation and manipulation. First, we summarize a general
pipeline that deep facial attribute analysis follows, which comprises two
stages: data preprocessing and model construction. Additionally, we introduce
the underlying theories of this two-stage pipeline for both FAE and FAM.
Second, the datasets and performance metrics commonly used in facial attribute
analysis are presented. Third, we create a taxonomy of state-of-the-art methods
and review deep FAE and FAM algorithms in detail. Furthermore, several
additional facial attribute related issues are introduced, as well as relevant
real-world applications. Finally, we discuss possible challenges and promising
future research directions.Comment: submitted to International Journal of Computer Vision (IJCV
Connecting the Dots Between MLE and RL for Sequence Prediction
Sequence prediction models can be learned from example sequences with a
variety of training algorithms. Maximum likelihood learning is simple and
efficient, yet can suffer from compounding error at test time. Reinforcement
learning such as policy gradient addresses the issue but can have prohibitively
poor exploration efficiency. A rich set of other algorithms such as RAML, SPG,
and data noising, have also been developed from different perspectives. This
paper establishes a formal connection between these algorithms. We present a
generalized entropy regularized policy optimization formulation, and show that
the apparently distinct algorithms can all be reformulated as special instances
of the framework, with the only difference being the configurations of a reward
function and a couple of hyperparameters. The unified interpretation offers a
systematic view of the varying properties of exploration and learning
efficiency. Besides, inspired from the framework, we present a new algorithm
that dynamically interpolates among the family of algorithms for scheduled
sequence model learning. Experiments on machine translation, text
summarization, and game imitation learning demonstrate the superiority of the
proposed algorithm.Comment: Major revision. The first two authors contributed equall
Multi-Scale Video Frame-Synthesis Network with Transitive Consistency Loss
Traditional approaches to interpolate/extrapolate frames in a video sequence
require accurate pixel correspondences between images, e.g., using optical
flow. Their results stem on the accuracy of optical flow estimation, and could
generate heavy artifacts when flow estimation failed. Recently methods using
auto-encoder has shown impressive progress, however they are usually trained
for specific interpolation/extrapolation settings and lack of flexibility and
In order to reduce these limitations, we propose a unified network to
parameterize the interest frame position and therefore infer
interpolate/extrapolate frames within the same framework. To achieve this, we
introduce a transitive consistency loss to better regularize the network. We
adopt a multi-scale structure for the network so that the parameters can be
shared across multi-layers. Our approach avoids expensive global optimization
of optical flow methods, and is efficient and flexible for video
interpolation/extrapolation applications. Experimental results have shown that
our method performs favorably against state-of-the-art methods
FingerNet: An Unified Deep Network for Fingerprint Minutiae Extraction
Minutiae extraction is of critical importance in automated fingerprint
recognition. Previous works on rolled/slap fingerprints failed on latent
fingerprints due to noisy ridge patterns and complex background noises. In this
paper, we propose a new way to design deep convolutional network combining
domain knowledge and the representation ability of deep learning. In terms of
orientation estimation, segmentation, enhancement and minutiae extraction,
several typical traditional methods performed well on rolled/slap fingerprints
are transformed into convolutional manners and integrated as an unified plain
network. We demonstrate that this pipeline is equivalent to a shallow network
with fixed weights. The network is then expanded to enhance its representation
ability and the weights are released to learn complex background variance from
data, while preserving end-to-end differentiability. Experimental results on
NIST SD27 latent database and FVC 2004 slap database demonstrate that the
proposed algorithm outperforms the state-of-the-art minutiae extraction
algorithms. Code is made publicly available at:
https://github.com/felixTY/FingerNet
Quality-aware Unpaired Image-to-Image Translation
Generative Adversarial Networks (GANs) have been widely used for the
image-to-image translation task. While these models rely heavily on the labeled
image pairs, recently some GAN variants have been proposed to tackle the
unpaired image translation task. These models exploited supervision at the
domain level with a reconstruction process for unpaired image translation. On
the other hand, parallel works have shown that leveraging perceptual loss
functions based on high level deep features could enhance the generated image
quality. Nevertheless, as these GAN-based models either depended on the
pretrained deep network structure or relied on the labeled image pairs, they
could not be directly applied to the unpaired image translation task. Moreover,
despite the improvement of the introduced perceptual losses from deep neural
networks, few researchers have explored the possibility of improving the
generated image quality from classical image quality measures. To tackle the
above two challenges, in this paper, we propose a unified quality-aware
GAN-based framework for unpaired image-to-image translation, where a
quality-aware loss is explicitly incorporated by comparing each source image
and the reconstructed image at the domain level. Specifically, we design two
detailed implementations of the quality loss. The first method is based on a
classical image quality assessment measure by defining a classical
quality-aware loss. The second method proposes an adaptive deep network based
loss. Finally, extensive experimental results on many real-world datasets
clearly show the quality improvement of our proposed framework, and the
superiority of leveraging classical image quality measures for unpaired image
translation compared to the deep network based model.Comment: IEEE Transactions on Multimedi
The NLP Engine: A Universal Turing Machine for NLP
It is commonly accepted that machine translation is a more complex task than
part of speech tagging. But how much more complex? In this paper we make an
attempt to develop a general framework and methodology for computing the
informational and/or processing complexity of NLP applications and tasks. We
define a universal framework akin to a Turning Machine that attempts to fit
(most) NLP tasks into one paradigm. We calculate the complexities of various
NLP tasks using measures of Shannon Entropy, and compare `simple' ones such as
part of speech tagging to `complex' ones such as machine translation. This
paper provides a first, though far from perfect, attempt to quantify NLP tasks
under a uniform paradigm. We point out current deficiencies and suggest some
avenues for fruitful research
Neural Approaches to Conversational AI
The present paper surveys neural approaches to conversational AI that have
been developed in the last few years. We group conversational systems into
three categories: (1) question answering agents, (2) task-oriented dialogue
agents, and (3) chatbots. For each category, we present a review of
state-of-the-art neural approaches, draw the connection between them and
traditional approaches, and discuss the progress that has been made and
challenges still being faced, using specific systems and models as case
studies.Comment: Foundations and Trends in Information Retrieval (95 pages
Mix and match networks: cross-modal alignment for zero-pair image-to-image translation
This paper addresses the problem of inferring unseen cross-modal
image-to-image translations between multiple modalities. We assume that only
some of the pairwise translations have been seen (i.e. trained) and infer the
remaining unseen translations (where training pairs are not available). We
propose mix and match networks, an approach where multiple encoders and
decoders are aligned in such a way that the desired translation can be obtained
by simply cascading the source encoder and the target decoder, even when they
have not interacted during the training stage (i.e. unseen). The main challenge
lies in the alignment of the latent representations at the bottlenecks of
encoder-decoder pairs. We propose an architecture with several tools to
encourage alignment, including autoencoders and robust side information and
latent consistency losses. We show the benefits of our approach in terms of
effectiveness and scalability compared with other pairwise image-to-image
translation approaches. We also propose zero-pair cross-modal image
translation, a challenging setting where the objective is inferring semantic
segmentation from depth (and vice-versa) without explicit segmentation-depth
pairs, and only from two (disjoint) segmentation-RGB and depth-RGB training
sets. We observe that a certain part of the shared information between unseen
modalities might not be reachable, so we further propose a variant that
leverages pseudo-pairs which allows us to exploit this shared information
between the unseen modalities.Comment: Accepted by IJC
The USFD Spoken Language Translation System for IWSLT 2014
The University of Sheffield (USFD) participated in the International Workshop
for Spoken Language Translation (IWSLT) in 2014. In this paper, we will
introduce the USFD SLT system for IWSLT. Automatic speech recognition (ASR) is
achieved by two multi-pass deep neural network systems with adaptation and
rescoring techniques. Machine translation (MT) is achieved by a phrase-based
system. The USFD primary system incorporates state-of-the-art ASR and MT
techniques and gives a BLEU score of 23.45 and 14.75 on the English-to-French
and English-to-German speech-to-text translation task with the IWSLT 2014 data.
The USFD contrastive systems explore the integration of ASR and MT by using a
quality estimation system to rescore the ASR outputs, optimising towards better
translation. This gives a further 0.54 and 0.26 BLEU improvement respectively
on the IWSLT 2012 and 2014 evaluation data
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
With widespread applications of artificial intelligence (AI), the
capabilities of the perception, understanding, decision-making and control for
autonomous systems have improved significantly in the past years. When
autonomous systems consider the performance of accuracy and transferability,
several AI methods, like adversarial learning, reinforcement learning (RL) and
meta-learning, show their powerful performance. Here, we review the
learning-based approaches in autonomous systems from the perspectives of
accuracy and transferability. Accuracy means that a well-trained model shows
good results during the testing phase, in which the testing set shares a same
task or a data distribution with the training set. Transferability means that
when a well-trained model is transferred to other testing domains, the accuracy
is still good. Firstly, we introduce some basic concepts of transfer learning
and then present some preliminaries of adversarial learning, RL and
meta-learning. Secondly, we focus on reviewing the accuracy or transferability
or both of them to show the advantages of adversarial learning, like generative
adversarial networks (GANs), in typical computer vision tasks in autonomous
systems, including image style transfer, image superresolution, image
deblurring/dehazing/rain removal, semantic segmentation, depth estimation,
pedestrian detection and person re-identification (re-ID). Then, we further
review the performance of RL and meta-learning from the aspects of accuracy or
transferability or both of them in autonomous systems, involving pedestrian
tracking, robot navigation and robotic manipulation. Finally, we discuss
several challenges and future topics for using adversarial learning, RL and
meta-learning in autonomous systems
- …