134,977 research outputs found
AMC: Attention guided Multi-modal Correlation Learning for Image Search
Given a user's query, traditional image search systems rank images according
to its relevance to a single modality (e.g., image content or surrounding
text). Nowadays, an increasing number of images on the Internet are available
with associated meta data in rich modalities (e.g., titles, keywords, tags,
etc.), which can be exploited for better similarity measure with queries. In
this paper, we leverage visual and textual modalities for image search by
learning their correlation with input query. According to the intent of query,
attention mechanism can be introduced to adaptively balance the importance of
different modalities. We propose a novel Attention guided Multi-modal
Correlation (AMC) learning method which consists of a jointly learned hierarchy
of intra and inter-attention networks. Conditioned on query's intent,
intra-attention networks (i.e., visual intra-attention network and language
intra-attention network) attend on informative parts within each modality; a
multi-modal inter-attention network promotes the importance of the most
query-relevant modalities. In experiments, we evaluate AMC models on the search
logs from two real world image search engines and show a significant boost on
the ranking of user-clicked images in search results. Additionally, we extend
AMC models to caption ranking task on COCO dataset and achieve competitive
results compared with recent state-of-the-arts.Comment: CVPR 201
LogCLEF: Enabling research on multilingual log files
Interactions between users and information access systems can be analyzed and studied to gather user preferences and to learn what a user likes the most, and to use this
information to adapt the search to users and personalize the presentation of results. The LogCLEF lab - ”A benchmarking activity on Multilingual Log File Analysis: Language
identification, query classification, success of a query” deals with information contained in query logs of search engines and digital libraries from which knowledge can be mined to understand search behavior in multilingual context. LogCLEF has created the first long-term standard collection for evaluation purposes in the area of log analysis. The LogCLEF 2011 lab is the continuation of the past two editions: as a pilot task in CLEF 2009, and a workshop in CLEF 2010. The Cross-Language Evaluation Forum (CLEF) promotes research and development in multilingual information access and is an activity of the PROMISE Network of Excellence
BERT-Embedding and Citation Network Analysis based Query Expansion Technique for Scholarly Search
The enormous growth of research publications has made it challenging for
academic search engines to bring the most relevant papers against the given
search query. Numerous solutions have been proposed over the years to improve
the effectiveness of academic search, including exploiting query expansion and
citation analysis. Query expansion techniques mitigate the mismatch between the
language used in a query and indexed documents. However, these techniques can
suffer from introducing non-relevant information while expanding the original
query. Recently, contextualized model BERT to document retrieval has been quite
successful in query expansion. Motivated by such issues and inspired by the
success of BERT, this paper proposes a novel approach called QeBERT. QeBERT
exploits BERT-based embedding and Citation Network Analysis (CNA) in query
expansion for improving scholarly search. Specifically, we use the
context-aware BERT-embedding and CNA for query expansion in Pseudo-Relevance
Feedback (PRF) fash-ion. Initial experimental results on the ACL dataset show
that BERT-embedding can provide a valuable augmentation to query expansion and
improve search relevance when combined with CNA.Comment: 1
Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising
The advancement of the communication technology and the popularity of the
smart phones foster the booming of video ads. Baidu, as one of the leading
search engine companies in the world, receives billions of search queries per
day. How to pair the video ads with the user search is the core task of Baidu
video advertising. Due to the modality gap, the query-to-video retrieval is
much more challenging than traditional query-to-document retrieval and
image-to-image search. Traditionally, the query-to-video retrieval is tackled
by the query-to-title retrieval, which is not reliable when the quality of
tiles are not high. With the rapid progress achieved in computer vision and
natural language processing in recent years, content-based search methods
becomes promising for the query-to-video retrieval. Benefited from pretraining
on large-scale datasets, some visionBERT methods based on cross-modal attention
have achieved excellent performance in many vision-language tasks not only in
academia but also in industry. Nevertheless, the expensive computation cost of
cross-modal attention makes it impractical for large-scale search in industrial
applications. In this work, we present a tree-based combo-attention network
(TCAN) which has been recently launched in Baidu's dynamic video advertising
platform. It provides a practical solution to deploy the heavy cross-modal
attention for the large-scale query-to-video search. After launching tree-based
combo-attention network, click-through rate gets improved by 2.29\% and
conversion rate get improved by 2.63\%.Comment: This revision is based on a manuscript submitted in October 2020, to
ICDE 2021. We thank the Program Committee for their valuable comment
Person Search with Natural Language Description
Searching persons in large-scale image databases with the query of natural
language description has important applications in video surveillance. Existing
methods mainly focused on searching persons with image-based or attribute-based
queries, which have major limitations for a practical usage. In this paper, we
study the problem of person search with natural language description. Given the
textual description of a person, the algorithm of the person search is required
to rank all the samples in the person database then retrieve the most relevant
sample corresponding to the queried description. Since there is no person
dataset or benchmark with textual description available, we collect a
large-scale person description dataset with detailed natural language
annotations and person samples from various sources, termed as CUHK Person
Description Dataset (CUHK-PEDES). A wide range of possible models and baselines
have been evaluated and compared on the person search benchmark. An Recurrent
Neural Network with Gated Neural Attention mechanism (GNA-RNN) is proposed to
establish the state-of-the art performance on person search
Recommended from our members
Combining Ontology Queries with Text Search in Service Discovery
We present a querying mechanism for service discovery which combines ontology queries with text search. The underlying service discovery architecture used is GloServ. GloServ uses the Web Ontology Language (OWL) to classify services in an ontology and map knowledge obtained by the ontology onto a hierarchical peer-to-peer network. Initially, an ontology-based first order predicate logic query is issued in order to route the query to the appropriate server and to obtain exact and related service data. Text search further enhances querying by allowing services to be described not only with ontology attributes, but with plain text so that users can query for them using key words. Currently, querying is limited to either simple attribute-value pair searches, ontology queries or text search. Combining ontology queries with text search enhances current service discovery mechanisms
- …