3,259 research outputs found
AMC: Attention guided Multi-modal Correlation Learning for Image Search
Given a user's query, traditional image search systems rank images according
to its relevance to a single modality (e.g., image content or surrounding
text). Nowadays, an increasing number of images on the Internet are available
with associated meta data in rich modalities (e.g., titles, keywords, tags,
etc.), which can be exploited for better similarity measure with queries. In
this paper, we leverage visual and textual modalities for image search by
learning their correlation with input query. According to the intent of query,
attention mechanism can be introduced to adaptively balance the importance of
different modalities. We propose a novel Attention guided Multi-modal
Correlation (AMC) learning method which consists of a jointly learned hierarchy
of intra and inter-attention networks. Conditioned on query's intent,
intra-attention networks (i.e., visual intra-attention network and language
intra-attention network) attend on informative parts within each modality; a
multi-modal inter-attention network promotes the importance of the most
query-relevant modalities. In experiments, we evaluate AMC models on the search
logs from two real world image search engines and show a significant boost on
the ranking of user-clicked images in search results. Additionally, we extend
AMC models to caption ranking task on COCO dataset and achieve competitive
results compared with recent state-of-the-arts.Comment: CVPR 201
Deep Convolutional Ranking for Multilabel Image Annotation
Multilabel image annotation is one of the most important challenges in
computer vision with many real-world applications. While existing work usually
use conventional visual features for multilabel annotation, features based on
Deep Neural Networks have shown potential to significantly boost performance.
In this work, we propose to leverage the advantage of such features and analyze
key components that lead to better performances. Specifically, we show that a
significant performance gain could be obtained by combining convolutional
architectures with approximate top- ranking objectives, as thye naturally
fit the multilabel tagging problem. Our experiments on the NUS-WIDE dataset
outperforms the conventional visual features by about 10%, obtaining the best
reported performance in the literature
- …