39,615 research outputs found
A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion
Users may strive to formulate an adequate textual query for their information
need. Search engines assist the users by presenting query suggestions. To
preserve the original search intent, suggestions should be context-aware and
account for the previous queries issued by the user. Achieving context
awareness is challenging due to data sparsity. We present a probabilistic
suggestion model that is able to account for sequences of previous queries of
arbitrary lengths. Our novel hierarchical recurrent encoder-decoder
architecture allows the model to be sensitive to the order of queries in the
context while avoiding data sparsity. Additionally, our model can suggest for
rare, or long-tail, queries. The produced suggestions are synthetic and are
sampled one word at a time, using computationally cheap decoding techniques.
This is in contrast to current synthetic suggestion models relying upon machine
learning pipelines and hand-engineered feature sets. Results show that it
outperforms existing context-aware approaches in a next query prediction
setting. In addition to query suggestion, our model is general enough to be
used in a variety of other applications.Comment: To appear in Conference of Information Knowledge and Management
(CIKM) 201
Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering
User information needs vary significantly across different tasks, and
therefore their queries will also differ considerably in their expressiveness
and semantics. Many studies have been proposed to model such query diversity by
obtaining query types and building query-dependent ranking models. These
studies typically require either a labeled query dataset or clicks from
multiple users aggregated over the same document. These techniques, however,
are not applicable when manual query labeling is not viable, and aggregated
clicks are unavailable due to the private nature of the document collection,
e.g., in email search scenarios. In this paper, we study how to obtain query
type in an unsupervised fashion and how to incorporate this information into
query-dependent ranking models. We first develop a hierarchical clustering
algorithm based on truncated SVD and varimax rotation to obtain coarse-to-fine
query types. Then, we study three query-dependent ranking models, including two
neural models that leverage query type information as additional features, and
one novel multi-task neural model that views query type as the label for the
auxiliary query cluster prediction task. This multi-task model is trained to
simultaneously rank documents and predict query types. Our experiments on tens
of millions of real-world email search queries demonstrate that the proposed
multi-task model can significantly outperform the baseline neural ranking
models, which either do not incorporate query type information or just simply
feed query type as an additional feature.Comment: CIKM 201
Investigating Retrieval Method Selection with Axiomatic Features
We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods' relevance scores into an overall relevance score. Inspired by neural models' different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner's behavior
Deep Character-Level Click-Through Rate Prediction for Sponsored Search
Predicting the click-through rate of an advertisement is a critical component
of online advertising platforms. In sponsored search, the click-through rate
estimates the probability that a displayed advertisement is clicked by a user
after she submits a query to the search engine. Commercial search engines
typically rely on machine learning models trained with a large number of
features to make such predictions. This is inevitably requires a lot of
engineering efforts to define, compute, and select the appropriate features. In
this paper, we propose two novel approaches (one working at character level and
the other working at word level) that use deep convolutional neural networks to
predict the click-through rate of a query-advertisement pair. Specially, the
proposed architectures only consider the textual content appearing in a
query-advertisement pair as input, and produce as output a click-through rate
prediction. By comparing the character-level model with the word-level model,
we show that language representation can be learnt from scratch at character
level when trained on enough data. Through extensive experiments using billions
of query-advertisement pairs of a popular commercial search engine, we
demonstrate that both approaches significantly outperform a baseline model
built on well-selected text features and a state-of-the-art word2vec-based
approach. Finally, by combining the predictions of the deep models introduced
in this study with the prediction of the model in production of the same
commercial search engine, we significantly improve the accuracy and the
calibration of the click-through rate prediction of the production system.Comment: SIGIR2017, 10 page
Cross-Domain Image Retrieval with Attention Modeling
With the proliferation of e-commerce websites and the ubiquitousness of smart
phones, cross-domain image retrieval using images taken by smart phones as
queries to search products on e-commerce websites is emerging as a popular
application. One challenge of this task is to locate the attention of both the
query and database images. In particular, database images, e.g. of fashion
products, on e-commerce websites are typically displayed with other
accessories, and the images taken by users contain noisy background and large
variations in orientation and lighting. Consequently, their attention is
difficult to locate. In this paper, we exploit the rich tag information
available on the e-commerce websites to locate the attention of database
images. For query images, we use each candidate image in the database as the
context to locate the query attention. Novel deep convolutional neural network
architectures, namely TagYNet and CtxYNet, are proposed to learn the attention
weights and then extract effective representations of the images. Experimental
results on public datasets confirm that our approaches have significant
improvement over the existing methods in terms of the retrieval accuracy and
efficiency.Comment: 8 pages with an extra reference pag
- …