177,103 research outputs found
Towards Universal Representation Learning for Deep Face Recognition
Recognizing wild faces is extremely hard as they appear with all kinds of
variations. Traditional methods either train with specifically annotated
variation data from target domains, or by introducing unlabeled target
variation data to adapt from the training data. Instead, we propose a universal
representation learning framework that can deal with larger variation unseen in
the given training data without leveraging target domain knowledge. We firstly
synthesize training data alongside some semantically meaningful variations,
such as low resolution, occlusion and head pose. However, directly feeding the
augmented data for training will not converge well as the newly introduced
samples are mostly hard examples. We propose to split the feature embedding
into multiple sub-embeddings, and associate different confidence values for
each sub-embedding to smooth the training procedure. The sub-embeddings are
further decorrelated by regularizing variation classification loss and
variation adversarial loss on different partitions of them. Experiments show
that our method achieves top performance on general face recognition datasets
such as LFW and MegaFace, while significantly better on extreme benchmarks such
as TinyFace and IJB-S.Comment: to appear in CVPR 202
A Deep Neural Model Of Emotion Appraisal
Emotional concepts play a huge role in our daily life since they take part
into many cognitive processes: from the perception of the environment around us
to different learning processes and natural communication. Social robots need
to communicate with humans, which increased also the popularity of affective
embodied models that adopt different emotional concepts in many everyday tasks.
However, there is still a gap between the development of these solutions and
the integration and development of a complex emotion appraisal system, which is
much necessary for true social robots. In this paper, we propose a deep neural
model which is designed in the light of different aspects of developmental
learning of emotional concepts to provide an integrated solution for internal
and external emotion appraisal. We evaluate the performance of the proposed
model with different challenging corpora and compare it with state-of-the-art
models for external emotion appraisal. To extend the evaluation of the proposed
model, we designed and collected a novel dataset based on a Human-Robot
Interaction (HRI) scenario. We deployed the model in an iCub robot and
evaluated the capability of the robot to learn and describe the affective
behavior of different persons based on observation. The performed experiments
demonstrate that the proposed model is competitive with the state of the art in
describing emotion behavior in general. In addition, it is able to generate
internal emotional concepts that evolve through time: it continuously forms and
updates the formed emotional concepts, which is a step towards creating an
emotional appraisal model grounded in the robot experiences
Adversarial Examples: Attacks and Defenses for Deep Learning
With rapid progress and significant successes in a wide spectrum of
applications, deep learning is being applied in many safety-critical
environments. However, deep neural networks have been recently found vulnerable
to well-designed input samples, called adversarial examples. Adversarial
examples are imperceptible to human but can easily fool deep neural networks in
the testing/deploying stage. The vulnerability to adversarial examples becomes
one of the major risks for applying deep neural networks in safety-critical
environments. Therefore, attacks and defenses on adversarial examples draw
great attention. In this paper, we review recent findings on adversarial
examples for deep neural networks, summarize the methods for generating
adversarial examples, and propose a taxonomy of these methods. Under the
taxonomy, applications for adversarial examples are investigated. We further
elaborate on countermeasures for adversarial examples and explore the
challenges and the potential solutions.Comment: Github: https://github.com/chbrian/awesome-adversarial-examples-d
Towards Learning a Universal Non-Semantic Representation of Speech
The ultimate goal of transfer learning is to reduce labeled data requirements
by exploiting a pre-existing embedding model trained for different datasets or
tasks. The visual and language communities have established benchmarks to
compare embeddings, but the speech community has yet to do so. This paper
proposes a benchmark for comparing speech representations on non-semantic
tasks, and proposes a representation based on an unsupervised triplet-loss
objective. The proposed representation outperforms other representations on the
benchmark, and even exceeds state-of-the-art performance on a number of
transfer learning tasks. The embedding is trained on a publicly available
dataset, and it is tested on a variety of low-resource downstream tasks,
including personalization tasks and medical domain. The benchmark, models, and
evaluation code are publicly released
The Foundations of Deep Learning with a Path Towards General Intelligence
Like any field of empirical science, AI may be approached axiomatically. We
formulate requirements for a general-purpose, human-level AI system in terms of
postulates. We review the methodology of deep learning, examining the explicit
and tacit assumptions in deep learning research. Deep Learning methodology
seeks to overcome limitations in traditional machine learning research as it
combines facets of model richness, generality, and practical applicability. The
methodology so far has produced outstanding results due to a productive synergy
of function approximation, under plausible assumptions of irreducibility and
the efficiency of back-propagation family of algorithms. We examine these
winning traits of deep learning, and also observe the various known failure
modes of deep learning. We conclude by giving recommendations on how to extend
deep learning methodology to cover the postulates of general-purpose AI
including modularity, and cognitive architecture. We also relate deep learning
to advances in theoretical neuroscience research.Comment: Submitted to AGI 201
Modeling of Facial Aging and Kinship: A Survey
Computational facial models that capture properties of facial cues related to
aging and kinship increasingly attract the attention of the research community,
enabling the development of reliable methods for age progression, age
estimation, age-invariant facial characterization, and kinship verification
from visual data. In this paper, we review recent advances in modeling of
facial aging and kinship. In particular, we provide an up-to date, complete
list of available annotated datasets and an in-depth analysis of geometric,
hand-crafted, and learned facial representations that are used for facial aging
and kinship characterization. Moreover, evaluation protocols and metrics are
reviewed and notable experimental results for each surveyed task are analyzed.
This survey allows us to identify challenges and discuss future research
directions for the development of robust facial models in real-world
conditions
Auditing ImageNet: Towards a Model-driven Framework for Annotating Demographic Attributes of Large-Scale Image Datasets
The ImageNet dataset ushered in a flood of academic and industry interest in
deep learning for computer vision applications. Despite its significant impact,
there has not been a comprehensive investigation into the demographic
attributes of images contained within the dataset. Such a study could lead to
new insights on inherent biases within ImageNet, particularly important given
it is frequently used to pretrain models for a wide variety of computer vision
tasks. In this work, we introduce a model-driven framework for the automatic
annotation of apparent age and gender attributes in large-scale image datasets.
Using this framework, we conduct the first demographic audit of the 2012
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) subset of ImageNet
and the "person" hierarchical category of ImageNet. We find that 41.62% of
faces in ILSVRC appear as female, 1.71% appear as individuals above the age of
60, and males aged 15 to 29 account for the largest subgroup with 27.11%. We
note that the presented model-driven framework is not fair for all
intersectional groups, so annotation are subject to bias. We present this work
as the starting point for future development of unbiased annotation models and
for the study of downstream effects of imbalances in the demographics of
ImageNet. Code and annotations are available at:
http://bit.ly/ImageNetDemoAuditComment: To appear in the Workshop on Fairness Accountability Transparency and
Ethics in Computer Vision (FATE CV) at CVPR 201
Attacks on State-of-the-Art Face Recognition using Attentional Adversarial Attack Generative Network
With the broad use of face recognition, its weakness gradually emerges that
it is able to be attacked. So, it is important to study how face recognition
networks are subject to attacks. In this paper, we focus on a novel way to do
attacks against face recognition network that misleads the network to identify
someone as the target person not misclassify inconspicuously. Simultaneously,
for this purpose, we introduce a specific attentional adversarial attack
generative network to generate fake face images. For capturing the semantic
information of the target person, this work adds a conditional variational
autoencoder and attention modules to learn the instance-level correspondences
between faces. Unlike traditional two-player GAN, this work introduces face
recognition networks as the third player to participate in the competition
between generator and discriminator which allows the attacker to impersonate
the target person better. The generated faces which are hard to arouse the
notice of onlookers can evade recognition by state-of-the-art networks and most
of them are recognized as the target person
Automatic Recognition of Student Engagement using Deep Learning and Facial Expression
Engagement is a key indicator of the quality of learning experience, and one
that plays a major role in developing intelligent educational interfaces. Any
such interface requires the ability to recognise the level of engagement in
order to respond appropriately; however, there is very little existing data to
learn from, and new data is expensive and difficult to acquire. This paper
presents a deep learning model to improve engagement recognition from images
that overcomes the data sparsity challenge by pre-training on readily available
basic facial expression data, before training on specialised engagement data.
In the first of two steps, a facial expression recognition model is trained to
provide a rich face representation using deep learning. In the second step, we
use the model's weights to initialize our deep learning based model to
recognize engagement; we term this the engagement model. We train the model on
our new engagement recognition dataset with 4627 engaged and disengaged
samples. We find that the engagement model outperforms effective deep learning
architectures that we apply for the first time to engagement recognition, as
well as approaches using histogram of oriented gradients and support vector
machines
Graphonomy: Universal Human Parsing via Graph Transfer Learning
Prior highly-tuned human parsing models tend to fit towards each dataset in a
specific domain or with discrepant label granularity, and can hardly be adapted
to other human parsing tasks without extensive re-training. In this paper, we
aim to learn a single universal human parsing model that can tackle all kinds
of human parsing needs by unifying label annotations from different domains or
at various levels of granularity. This poses many fundamental learning
challenges, e.g. discovering underlying semantic structures among different
label granularity, performing proper transfer learning across different image
domains, and identifying and utilizing label redundancies across related tasks.
To address these challenges, we propose a new universal human parsing agent,
named "Graphonomy", which incorporates hierarchical graph transfer learning
upon the conventional parsing network to encode the underlying label semantic
structures and propagate relevant semantic information. In particular,
Graphonomy first learns and propagates compact high-level graph representation
among the labels within one dataset via Intra-Graph Reasoning, and then
transfers semantic information across multiple datasets via Inter-Graph
Transfer. Various graph transfer dependencies (\eg, similarity, linguistic
knowledge) between different datasets are analyzed and encoded to enhance graph
transfer capability. By distilling universal semantic graph representation to
each specific task, Graphonomy is able to predict all levels of parsing labels
in one system without piling up the complexity. Experimental results show
Graphonomy effectively achieves the state-of-the-art results on three human
parsing benchmarks as well as advantageous universal human parsing performance.Comment: Accepted to CVPR 2019. The Code is available at
https://github.com/Gaoyiminggithub/Graphonom
- …