35,199 research outputs found
Identifying effective translations for cross-lingual Arabic-to-English user-generated speech search
Cross Language Information Retrieval
(CLIR) systems are a valuable tool to enable speakers of one language to search for
content of interest expressed in a different
language. A group for whom this is of particular interest is bilingual Arabic speakers
who wish to search for English language
content using information needs expressed
in Arabic queries. A key challenge in
CLIR is crossing the language barrier
between the query and the documents.
The most common approach to bridging
this gap is automated query translation,
which can be unreliable for vague or short
queries. In this work, we examine the
potential for improving CLIR effectiveness
by predicting the translation effectiveness
using Query Performance Prediction (QPP)
techniques. We propose a novel QPP
method to estimate the quality of translation for an Arabic-Engish Cross-lingual
User-generated Speech Search (CLUGS)
task. We present an empirical evaluation
that demonstrates the quality of our method
on alternative translation outputs extracted
from an Arabic-to-English Machine Translation system developed for this task. Finally, we show how this framework can be
integrated in CLUGS to find relevant translations for improved retrieval performance
Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric Approach
A significant amount of search queries originate from some real world
information need or tasks. In order to improve the search experience of the end
users, it is important to have accurate representations of tasks. As a result,
significant amount of research has been devoted to extracting proper
representations of tasks in order to enable search systems to help users
complete their tasks, as well as providing the end user with better query
suggestions, for better recommendations, for satisfaction prediction, and for
improved personalization in terms of tasks. Most existing task extraction
methodologies focus on representing tasks as flat structures. However, tasks
often tend to have multiple subtasks associated with them and a more
naturalistic representation of tasks would be in terms of a hierarchy, where
each task can be composed of multiple (sub)tasks. To this end, we propose an
efficient Bayesian nonparametric model for extracting hierarchies of such tasks
\& subtasks. We evaluate our method based on real world query log data both
through quantitative and crowdsourced experiments and highlight the importance
of considering task/subtask hierarchies.Comment: 10 pages. Accepted at SIGIR 2017 as a full pape
A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion
Users may strive to formulate an adequate textual query for their information
need. Search engines assist the users by presenting query suggestions. To
preserve the original search intent, suggestions should be context-aware and
account for the previous queries issued by the user. Achieving context
awareness is challenging due to data sparsity. We present a probabilistic
suggestion model that is able to account for sequences of previous queries of
arbitrary lengths. Our novel hierarchical recurrent encoder-decoder
architecture allows the model to be sensitive to the order of queries in the
context while avoiding data sparsity. Additionally, our model can suggest for
rare, or long-tail, queries. The produced suggestions are synthetic and are
sampled one word at a time, using computationally cheap decoding techniques.
This is in contrast to current synthetic suggestion models relying upon machine
learning pipelines and hand-engineered feature sets. Results show that it
outperforms existing context-aware approaches in a next query prediction
setting. In addition to query suggestion, our model is general enough to be
used in a variety of other applications.Comment: To appear in Conference of Information Knowledge and Management
(CIKM) 201
Reply With: Proactive Recommendation of Email Attachments
Email responses often contain items-such as a file or a hyperlink to an
external document-that are attached to or included inline in the body of the
message. Analysis of an enterprise email corpus reveals that 35% of the time
when users include these items as part of their response, the attachable item
is already present in their inbox or sent folder. A modern email client can
proactively retrieve relevant attachable items from the user's past emails
based on the context of the current conversation, and recommend them for
inclusion, to reduce the time and effort involved in composing the response. In
this paper, we propose a weakly supervised learning framework for recommending
attachable items to the user. As email search systems are commonly available,
we constrain the recommendation task to formulating effective search queries
from the context of the conversations. The query is submitted to an existing IR
system to retrieve relevant items for attachment. We also present a novel
strategy for generating labels from an email corpus---without the need for
manual annotations---that can be used to train and evaluate the query
formulation model. In addition, we describe a deep convolutional neural network
that demonstrates satisfactory performance on this query formulation task when
evaluated on the publicly available Avocado dataset and a proprietary dataset
of internal emails obtained through an employee participation program.Comment: CIKM2017. Proceedings of the 26th ACM International Conference on
Information and Knowledge Management. 201
Learned Cardinalities: Estimating Correlated Joins with Deep Learning
We describe a new deep learning approach to cardinality estimation. MSCN is a
multi-set convolutional network, tailored to representing relational query
plans, that employs set semantics to capture query features and true
cardinalities. MSCN builds on sampling-based estimation, addressing its
weaknesses when no sampled tuples qualify a predicate, and in capturing
join-crossing correlations. Our evaluation of MSCN using a real-world dataset
shows that deep learning significantly enhances the quality of cardinality
estimation, which is the core problem in query optimization.Comment: CIDR 2019. https://github.com/andreaskipf/learnedcardinalitie
- …