94,442 research outputs found
Privacy in Deep Learning: A Survey
The ever-growing advances of deep learning in many areas including vision,
recommendation systems, natural language processing, etc., have led to the
adoption of Deep Neural Networks (DNNs) in production systems. The availability
of large datasets and high computational power are the main contributors to
these advances. The datasets are usually crowdsourced and may contain sensitive
information. This poses serious privacy concerns as this data can be misused or
leaked through various vulnerabilities. Even if the cloud provider and the
communication link is trusted, there are still threats of inference attacks
where an attacker could speculate properties of the data used for training, or
find the underlying model architecture and parameters. In this survey, we
review the privacy concerns brought by deep learning, and the mitigating
techniques introduced to tackle these issues. We also show that there is a gap
in the literature regarding test-time inference privacy, and propose possible
future research directions
CA3Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification
Person re-identification aims to identify the same pedestrian across
non-overlapping camera views. Deep learning techniques have been applied for
person re-identification recently, towards learning representation of
pedestrian appearance. This paper presents a novel Contextual-Attentional
Attribute-Appearance Network (CA3Net) for person re-identification. The CA3Net
simultaneously exploits the complementarity between semantic attributes and
visual appearance, the semantic context among attributes, visual attention on
attributes as well as spatial dependencies among body parts, leading to
discriminative and robust pedestrian representation. Specifically, an attribute
network within CA3Net is designed with an Attention-LSTM module. It
concentrates the network on latent image regions related to each attribute as
well as exploits the semantic context among attributes by a LSTM module. An
appearance network is developed to learn appearance features from the full
body, horizontal and vertical body parts of pedestrians with spatial
dependencies among body parts. The CA3Net jointly learns the attribute and
appearance features in a multi-task learning manner, generating comprehensive
representation of pedestrians. Extensive experiments on two challenging
benchmarks, i.e., Market-1501 and DukeMTMC-reID datasets, have demonstrated the
effectiveness of the proposed approach
Content-Based Filtering for Video Sharing Social Networks
In this paper we compare the use of several features in the task of content
filtering for video social networks, a very challenging task, not only because
the unwanted content is related to very high-level semantic concepts (e.g.,
pornography, violence, etc.) but also because videos from social networks are
extremely assorted, preventing the use of constrained a priori information. We
propose a simple method, able to combine diverse evidence, coming from
different features and various video elements (entire video, shots, frames,
keyframes, etc.). We evaluate our method in three social network applications,
related to the detection of unwanted content - pornographic videos, violent
videos, and videos posted to artificially manipulate popularity scores. Using
challenging test databases, we show that this simple scheme is able to obtain
good results, provided that adequate features are chosen. Moreover, we
establish a representation using codebooks of spatiotemporal local descriptors
as critical to the success of the method in all three contexts. This is
consequential, since the state-of-the-art still relies heavily on static
features for the tasks addressed
BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation
Age estimation is an important yet very challenging problem in computer
vision. Existing methods for age estimation usually apply a divide-and-conquer
strategy to deal with heterogeneous data caused by the non-stationary aging
process. However, the facial aging process is also a continuous process, and
the continuity relationship between different components has not been
effectively exploited. In this paper, we propose BridgeNet for age estimation,
which aims to mine the continuous relation between age labels effectively. The
proposed BridgeNet consists of local regressors and gating networks. Local
regressors partition the data space into multiple overlapping subspaces to
tackle heterogeneous data and gating networks learn continuity aware weights
for the results of local regressors by employing the proposed bridge-tree
structure, which introduces bridge connections into tree models to enforce the
similarity between neighbor nodes. Moreover, these two components of BridgeNet
can be jointly learned in an end-to-end way. We show experimental results on
the MORPH II, FG-NET and Chalearn LAP 2015 datasets and find that BridgeNet
outperforms the state-of-the-art methods.Comment: CVPR 201
Modeling of Facial Aging and Kinship: A Survey
Computational facial models that capture properties of facial cues related to
aging and kinship increasingly attract the attention of the research community,
enabling the development of reliable methods for age progression, age
estimation, age-invariant facial characterization, and kinship verification
from visual data. In this paper, we review recent advances in modeling of
facial aging and kinship. In particular, we provide an up-to date, complete
list of available annotated datasets and an in-depth analysis of geometric,
hand-crafted, and learned facial representations that are used for facial aging
and kinship characterization. Moreover, evaluation protocols and metrics are
reviewed and notable experimental results for each surveyed task are analyzed.
This survey allows us to identify challenges and discuss future research
directions for the development of robust facial models in real-world
conditions
Combating Human Trafficking with Deep Multimodal Models
Human trafficking is a global epidemic affecting millions of people across
the planet. Sex trafficking, the dominant form of human trafficking, has seen a
significant rise mostly due to the abundance of escort websites, where human
traffickers can openly advertise among at-will escort advertisements. In this
paper, we take a major step in the automatic detection of advertisements
suspected to pertain to human trafficking. We present a novel dataset called
Trafficking-10k, with more than 10,000 advertisements annotated for this task.
The dataset contains two sources of information per advertisement: text and
images. For the accurate detection of trafficking advertisements, we designed
and trained a deep multimodal model called the Human Trafficking Deep Network
(HTDN).Comment: ACL 2017 Long Pape
Improve bone age assessment by learning from anatomical local regions
Skeletal bone age assessment (BAA), as an essential imaging examination, aims
at evaluating the biological and structural maturation of human bones. In the
clinical practice, Tanner and Whitehouse (TW2) method is a widely-used method
for radiologists to perform BAA. The TW2 method splits the hands into Region Of
Interests (ROI) and analyzes each of the anatomical ROI separately to estimate
the bone age. Because of considering the analysis of local information, the TW2
method shows accurate results in practice. Following the spirit of TW2, we
propose a novel model called Anatomical Local-Aware Network (ALA-Net) for
automatic bone age assessment. In ALA-Net, anatomical local extraction module
is introduced to learn the hand structure and extract local information.
Moreover, we design an anatomical patch training strategy to provide extra
regularization during the training process. Our model can detect the anatomical
ROIs and estimate bone age jointly in an end-to-end manner. The experimental
results show that our ALA-Net achieves a new state-of-the-art single model
performance of 3.91 mean absolute error (MAE) on the public available RSNA
dataset. Since the design of our model is well consistent with the well
recognized TW2 method, it is interpretable and reliable for clinical usage.Comment: Early accepted to MICCAI202
Distributed generation of privacy preserving data with user customization
Distributed devices such as mobile phones can produce and store large amounts
of data that can enhance machine learning models; however, this data may
contain private information specific to the data owner that prevents the
release of the data. We wish to reduce the correlation between user-specific
private information and data while maintaining the useful information. Rather
than learning a large model to achieve privatization from end to end, we
introduce a decoupling of the creation of a latent representation and the
privatization of data that allows user-specific privatization to occur in a
distributed setting with limited computation and minimal disturbance on the
utility of the data. We leverage a Variational Autoencoder (VAE) to create a
compact latent representation of the data; however, the VAE remains fixed for
all devices and all possible private labels. We then train a small generative
filter to perturb the latent representation based on individual preferences
regarding the private and utility information. The small filter is trained by
utilizing a GAN-type robust optimization that can take place on a distributed
device. We conduct experiments on three popular datasets: MNIST, UCI-Adult, and
CelebA, and give a thorough evaluation including visualizing the geometry of
the latent embeddings and estimating the empirical mutual information to show
the effectiveness of our approach.Comment: accepted in ICLR 2019 SafeML worksho
Mobile Multimedia Recommendation in Smart Communities: A Survey
Due to the rapid growth of internet broadband access and proliferation of
modern mobile devices, various types of multimedia (e.g. text, images, audios
and videos) have become ubiquitously available anytime. Mobile device users
usually store and use multimedia contents based on their personal interests and
preferences. Mobile device challenges such as storage limitation have however
introduced the problem of mobile multimedia overload to users. In order to
tackle this problem, researchers have developed various techniques that
recommend multimedia for mobile users. In this survey paper, we examine the
importance of mobile multimedia recommendation systems from the perspective of
three smart communities, namely, mobile social learning, mobile event guide and
context-aware services. A cautious analysis of existing research reveals that
the implementation of proactive, sensor-based and hybrid recommender systems
can improve mobile multimedia recommendations. Nevertheless, there are still
challenges and open issues such as the incorporation of context and social
properties, which need to be tackled in order to generate accurate and
trustworthy mobile multimedia recommendations
Joint Attention in Driver-Pedestrian Interaction: from Theory to Practice
Today, one of the major challenges that autonomous vehicles are facing is the
ability to drive in urban environments. Such a task requires communication
between autonomous vehicles and other road users in order to resolve various
traffic ambiguities. The interaction between road users is a form of
negotiation in which the parties involved have to share their attention
regarding a common objective or a goal (e.g. crossing an intersection), and
coordinate their actions in order to accomplish it. In this literature review
we aim to address the interaction problem between pedestrians and drivers (or
vehicles) from joint attention point of view. More specifically, we will
discuss the theoretical background behind joint attention, its application to
traffic interaction and practical approaches to implementing joint attention
for autonomous vehicles
- …