32,308 research outputs found
AMC: Attention guided Multi-modal Correlation Learning for Image Search
Given a user's query, traditional image search systems rank images according
to its relevance to a single modality (e.g., image content or surrounding
text). Nowadays, an increasing number of images on the Internet are available
with associated meta data in rich modalities (e.g., titles, keywords, tags,
etc.), which can be exploited for better similarity measure with queries. In
this paper, we leverage visual and textual modalities for image search by
learning their correlation with input query. According to the intent of query,
attention mechanism can be introduced to adaptively balance the importance of
different modalities. We propose a novel Attention guided Multi-modal
Correlation (AMC) learning method which consists of a jointly learned hierarchy
of intra and inter-attention networks. Conditioned on query's intent,
intra-attention networks (i.e., visual intra-attention network and language
intra-attention network) attend on informative parts within each modality; a
multi-modal inter-attention network promotes the importance of the most
query-relevant modalities. In experiments, we evaluate AMC models on the search
logs from two real world image search engines and show a significant boost on
the ranking of user-clicked images in search results. Additionally, we extend
AMC models to caption ranking task on COCO dataset and achieve competitive
results compared with recent state-of-the-arts.Comment: CVPR 201
Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors
The impressive performance of deep convolutional neural networks in
single-view 3D reconstruction suggests that these models perform non-trivial
reasoning about the 3D structure of the output space. However, recent work has
challenged this belief, showing that complex encoder-decoder architectures
perform similarly to nearest-neighbor baselines or simple linear decoder models
that exploit large amounts of per category data in standard benchmarks. On the
other hand settings where 3D shape must be inferred for new categories with few
examples are more natural and require models that generalize about shapes. In
this work we demonstrate experimentally that naive baselines do not apply when
the goal is to learn to reconstruct novel objects using very few examples, and
that in a \emph{few-shot} learning setting, the network must learn concepts
that can be applied to new categories, avoiding rote memorization. To address
deficiencies in existing approaches to this problem, we propose three
approaches that efficiently integrate a class prior into a 3D reconstruction
model, allowing to account for intra-class variability and imposing an implicit
compositional structure that the model should learn. Experiments on the popular
ShapeNet database demonstrate that our method significantly outperform existing
baselines on this task in the few-shot setting
Coupled Deep Learning for Heterogeneous Face Recognition
Heterogeneous face matching is a challenge issue in face recognition due to
large domain difference as well as insufficient pairwise images in different
modalities during training. This paper proposes a coupled deep learning (CDL)
approach for the heterogeneous face matching. CDL seeks a shared feature space
in which the heterogeneous face matching problem can be approximately treated
as a homogeneous face matching problem. The objective function of CDL mainly
includes two parts. The first part contains a trace norm and a block-diagonal
prior as relevance constraints, which not only make unpaired images from
multiple modalities be clustered and correlated, but also regularize the
parameters to alleviate overfitting. An approximate variational formulation is
introduced to deal with the difficulties of optimizing low-rank constraint
directly. The second part contains a cross modal ranking among triplet domain
specific images to maximize the margin for different identities and increase
data for a small amount of training samples. Besides, an alternating
minimization method is employed to iteratively update the parameters of CDL.
Experimental results show that CDL achieves better performance on the
challenging CASIA NIR-VIS 2.0 face recognition database, the IIIT-D Sketch
database, the CUHK Face Sketch (CUFS), and the CUHK Face Sketch FERET (CUFSF),
which significantly outperforms state-of-the-art heterogeneous face recognition
methods.Comment: AAAI 201
Deep Adaptive Feature Embedding with Local Sample Distributions for Person Re-identification
Person re-identification (re-id) aims to match pedestrians observed by
disjoint camera views. It attracts increasing attention in computer vision due
to its importance to surveillance system. To combat the major challenge of
cross-view visual variations, deep embedding approaches are proposed by
learning a compact feature space from images such that the Euclidean distances
correspond to their cross-view similarity metric. However, the global Euclidean
distance cannot faithfully characterize the ideal similarity in a complex
visual feature space because features of pedestrian images exhibit unknown
distributions due to large variations in poses, illumination and occlusion.
Moreover, intra-personal training samples within a local range are robust to
guide deep embedding against uncontrolled variations, which however, cannot be
captured by a global Euclidean distance. In this paper, we study the problem of
person re-id by proposing a novel sampling to mine suitable \textit{positives}
(i.e. intra-class) within a local range to improve the deep embedding in the
context of large intra-class variations. Our method is capable of learning a
deep similarity metric adaptive to local sample structure by minimizing each
sample's local distances while propagating through the relationship between
samples to attain the whole intra-class minimization. To this end, a novel
objective function is proposed to jointly optimize similarity metric learning,
local positive mining and robust deep embedding. This yields local
discriminations by selecting local-ranged positive samples, and the learned
features are robust to dramatic intra-class variations. Experiments on
benchmarks show state-of-the-art results achieved by our method.Comment: Published on Pattern Recognitio
- …