18,231 research outputs found
Identifying effective translations for cross-lingual Arabic-to-English user-generated speech search
Cross Language Information Retrieval
(CLIR) systems are a valuable tool to enable speakers of one language to search for
content of interest expressed in a different
language. A group for whom this is of particular interest is bilingual Arabic speakers
who wish to search for English language
content using information needs expressed
in Arabic queries. A key challenge in
CLIR is crossing the language barrier
between the query and the documents.
The most common approach to bridging
this gap is automated query translation,
which can be unreliable for vague or short
queries. In this work, we examine the
potential for improving CLIR effectiveness
by predicting the translation effectiveness
using Query Performance Prediction (QPP)
techniques. We propose a novel QPP
method to estimate the quality of translation for an Arabic-Engish Cross-lingual
User-generated Speech Search (CLUGS)
task. We present an empirical evaluation
that demonstrates the quality of our method
on alternative translation outputs extracted
from an Arabic-to-English Machine Translation system developed for this task. Finally, we show how this framework can be
integrated in CLUGS to find relevant translations for improved retrieval performance
Validating simulated interaction for retrieval evaluation
A searcherâs interaction with a retrieval system consists of actions such as query formulation, search result list interaction and document interaction. The simulation of searcher interaction has recently gained momentum in the analysis and evaluation of interactive information retrieval (IIR). However, a key issue that has not yet been adequately addressed is the validity of such IIR simulations and whether they reliably predict the performance obtained by a searcher across the session. The aim of this paper is to determine the validity of the common interaction model (CIM) typically used for simulating multi-query sessions. We focus on search result interactions, i.e., inspecting snippets, examining documents and deciding when to stop examining the results of a single query, or when to stop the whole session. To this end, we run a series of simulations grounded by real world behavioral data to show how accurate and responsive the model is to various experimental conditions under which the data were produced. We then validate on a second real world data set derived under similar experimental conditions. We seek to predict cumulated gain across the session. We find that the interaction model with a query-level stopping strategy based on consecutive non-relevant snippets leads to the highest prediction accuracy, and lowest deviation from ground truth, around 9 to 15% depending on the experimental conditions. To our knowledge, the present study is the first validation effort of the CIM that shows that the modelâs acceptance and use is justified within IIR evaluations. We also identify and discuss ways to further improve the CIM and its behavioral parameters for more accurate simulations
Deeper Text Understanding for IR with Contextual Neural Language Modeling
Neural networks provide new possibilities to automatically learn complex
language patterns and query-document relations. Neural IR models have achieved
promising results in learning query-document relevance patterns, but few
explorations have been done on understanding the text content of a query or a
document. This paper studies leveraging a recently-proposed contextual neural
language model, BERT, to provide deeper text understanding for IR. Experimental
results demonstrate that the contextual text representations from BERT are more
effective than traditional word embeddings. Compared to bag-of-words retrieval
models, the contextual language model can better leverage language structures,
bringing large improvements on queries written in natural languages. Combining
the text understanding ability with search knowledge leads to an enhanced
pre-trained BERT model that can benefit related search tasks where training
data are limited.Comment: In proceedings of SIGIR 201
A Reinforcement Learning-driven Translation Model for Search-Oriented Conversational Systems
Search-oriented conversational systems rely on information needs expressed in
natural language (NL). We focus here on the understanding of NL expressions for
building keyword-based queries. We propose a reinforcement-learning-driven
translation model framework able to 1) learn the translation from NL
expressions to queries in a supervised way, and, 2) to overcome the lack of
large-scale dataset by framing the translation model as a word selection
approach and injecting relevance feedback in the learning process. Experiments
are carried out on two TREC datasets and outline the effectiveness of our
approach.Comment: This is the author's pre-print version of the work. It is posted here
for your personal use, not for redistribution. Please cite the definitive
version which will be published in Proceedings of the 2018 EMNLP Workshop
SCAI: The 2nd International Workshop on Search-Oriented Conversational AI -
ISBN: 978-1-948087-75-
New perspectives on Web search engine research
PurposeâThe purpose of this chapter is to give an overview of the context of Web search and search engine-related research, as well as to introduce the reader to the sections and chapters of the book. Methodology/approachâWe review literature dealing with various aspects of search engines, with special emphasis on emerging areas of Web searching, search engine evaluation going beyond traditional methods, and new perspectives on Webs earching. FindingsâThe approaches to studying Web search engines are manifold. Given the importance of Web search engines for knowledge acquisition, research from different perspectives needs to be integrated into a more cohesive perspective. Researchlimitations/implicationsâThe chapter suggests a basis for research in the field and also introduces further research directions. Originality/valueofpaperâThe chapter gives a concise overview of the topics dealt with in the book and also shows directions for researchers interested in Web search engines
Reply With: Proactive Recommendation of Email Attachments
Email responses often contain items-such as a file or a hyperlink to an
external document-that are attached to or included inline in the body of the
message. Analysis of an enterprise email corpus reveals that 35% of the time
when users include these items as part of their response, the attachable item
is already present in their inbox or sent folder. A modern email client can
proactively retrieve relevant attachable items from the user's past emails
based on the context of the current conversation, and recommend them for
inclusion, to reduce the time and effort involved in composing the response. In
this paper, we propose a weakly supervised learning framework for recommending
attachable items to the user. As email search systems are commonly available,
we constrain the recommendation task to formulating effective search queries
from the context of the conversations. The query is submitted to an existing IR
system to retrieve relevant items for attachment. We also present a novel
strategy for generating labels from an email corpus---without the need for
manual annotations---that can be used to train and evaluate the query
formulation model. In addition, we describe a deep convolutional neural network
that demonstrates satisfactory performance on this query formulation task when
evaluated on the publicly available Avocado dataset and a proprietary dataset
of internal emails obtained through an employee participation program.Comment: CIKM2017. Proceedings of the 26th ACM International Conference on
Information and Knowledge Management. 201
- âŠ