624 research outputs found
A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion
Users may strive to formulate an adequate textual query for their information
need. Search engines assist the users by presenting query suggestions. To
preserve the original search intent, suggestions should be context-aware and
account for the previous queries issued by the user. Achieving context
awareness is challenging due to data sparsity. We present a probabilistic
suggestion model that is able to account for sequences of previous queries of
arbitrary lengths. Our novel hierarchical recurrent encoder-decoder
architecture allows the model to be sensitive to the order of queries in the
context while avoiding data sparsity. Additionally, our model can suggest for
rare, or long-tail, queries. The produced suggestions are synthetic and are
sampled one word at a time, using computationally cheap decoding techniques.
This is in contrast to current synthetic suggestion models relying upon machine
learning pipelines and hand-engineered feature sets. Results show that it
outperforms existing context-aware approaches in a next query prediction
setting. In addition to query suggestion, our model is general enough to be
used in a variety of other applications.Comment: To appear in Conference of Information Knowledge and Management
(CIKM) 201
Learning to Attend, Copy, and Generate for Session-Based Query Suggestion
Users try to articulate their complex information needs during search
sessions by reformulating their queries. To make this process more effective,
search engines provide related queries to help users in specifying the
information need in their search process. In this paper, we propose a
customized sequence-to-sequence model for session-based query suggestion. In
our model, we employ a query-aware attention mechanism to capture the structure
of the session context. is enables us to control the scope of the session from
which we infer the suggested next query, which helps not only handle the noisy
data but also automatically detect session boundaries. Furthermore, we observe
that, based on the user query reformulation behavior, within a single session a
large portion of query terms is retained from the previously submitted queries
and consists of mostly infrequent or unseen terms that are usually not included
in the vocabulary. We therefore empower the decoder of our model to access the
source words from the session context during decoding by incorporating a copy
mechanism. Moreover, we propose evaluation metrics to assess the quality of the
generative models for query suggestion. We conduct an extensive set of
experiments and analysis. e results suggest that our model outperforms the
baselines both in terms of the generating queries and scoring candidate queries
for the task of query suggestion.Comment: Accepted to be published at The 26th ACM International Conference on
Information and Knowledge Management (CIKM2017
Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval
Neural networks with deep architectures have demonstrated significant
performance improvements in computer vision, speech recognition, and natural
language processing. The challenges in information retrieval (IR), however, are
different from these other application areas. A common form of IR involves
ranking of documents--or short passages--in response to keyword-based queries.
Effective IR systems must deal with query-document vocabulary mismatch problem,
by modeling relationships between different query and document terms and how
they indicate relevance. Models should also consider lexical matches when the
query contains rare terms--such as a person's name or a product model
number--not seen during training, and to avoid retrieving semantically related
but irrelevant results. In many real-life IR tasks, the retrieval involves
extremely large collections--such as the document index of a commercial Web
search engine--containing billions of documents. Efficient IR methods should
take advantage of specialized IR data structures, such as inverted index, to
efficiently retrieve from large collections. Given an information need, the IR
system also mediates how much exposure an information artifact receives by
deciding whether it should be displayed, and where it should be positioned,
among other results. Exposure-aware IR systems may optimize for additional
objectives, besides relevance, such as parity of exposure for retrieved items
and content publishers. In this thesis, we present novel neural architectures
and methods motivated by the specific needs and challenges of IR tasks.Comment: PhD thesis, Univ College London (2020
Patterns of gender-specializing query reformulation
Users of search systems often reformulate their queries by adding query terms
to reflect their evolving information need or to more precisely express their
information need when the system fails to surface relevant content. Analyzing
these query reformulations can inform us about both system and user behavior.
In this work, we study a special category of query reformulations that involve
specifying demographic group attributes, such as gender, as part of the
reformulated query (e.g., "olympic 2021 soccer results" to "olympic 2021
women's soccer results"). There are many ways a query, the search results, and
a demographic attribute such as gender may relate, leading us to hypothesize
different causes for these reformulation patterns, such as under-representation
on the original result page or based on the linguistic theory of markedness.
This paper reports on an observational study of gender-specializing query
reformulations -- their contexts and effects -- as a lens on the relationship
between system results and gender, based on large-scale search log data from
Bing. We find that these reformulations sometimes correct for and other times
reinforce gender representation on the original result page, but typically
yield better access to the ultimately-selected results. The prevalence of these
reformulations -- and which gender they skew towards -- differ by topical
context. However, we do not find evidence that either group
under-representation or markedness alone adequately explains these
reformulations. We hope that future research will use such reformulations as a
probe for deeper investigation into gender (and other demographic)
representation on the search result page
Generic Intent Representation in Web Search
This paper presents GEneric iNtent Encoder (GEN Encoder) which learns a
distributed representation space for user intent in search. Leveraging large
scale user clicks from Bing search logs as weak supervision of user intent, GEN
Encoder learns to map queries with shared clicks into similar embeddings
end-to-end and then finetunes on multiple paraphrase tasks. Experimental
results on an intrinsic evaluation task - query intent similarity modeling -
demonstrate GEN Encoder's robust and significant advantages over previous
representation methods. Ablation studies reveal the crucial role of learning
from implicit user feedback in representing user intent and the contributions
of multi-task learning in representation generality. We also demonstrate that
GEN Encoder alleviates the sparsity of tail search traffic and cuts down half
of the unseen queries by using an efficient approximate nearest neighbor search
to effectively identify previous queries with the same search intent. Finally,
we demonstrate distances between GEN encodings reflect certain information
seeking behaviors in search sessions
- …