5 research outputs found
Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering
User information needs vary significantly across different tasks, and
therefore their queries will also differ considerably in their expressiveness
and semantics. Many studies have been proposed to model such query diversity by
obtaining query types and building query-dependent ranking models. These
studies typically require either a labeled query dataset or clicks from
multiple users aggregated over the same document. These techniques, however,
are not applicable when manual query labeling is not viable, and aggregated
clicks are unavailable due to the private nature of the document collection,
e.g., in email search scenarios. In this paper, we study how to obtain query
type in an unsupervised fashion and how to incorporate this information into
query-dependent ranking models. We first develop a hierarchical clustering
algorithm based on truncated SVD and varimax rotation to obtain coarse-to-fine
query types. Then, we study three query-dependent ranking models, including two
neural models that leverage query type information as additional features, and
one novel multi-task neural model that views query type as the label for the
auxiliary query cluster prediction task. This multi-task model is trained to
simultaneously rank documents and predict query types. Our experiments on tens
of millions of real-world email search queries demonstrate that the proposed
multi-task model can significantly outperform the baseline neural ranking
models, which either do not incorporate query type information or just simply
feed query type as an additional feature.Comment: CIKM 201
Separate and Attend in Personal Email Search
In personal email search, user queries often impose different requirements on
different aspects of the retrieved emails. For example, the query "my recent
flight to the US" requires emails to be ranked based on both textual contents
and recency of the email documents, while other queries such as "medical
history" do not impose any constraints on the recency of the email. Recent deep
learning-to-rank models for personal email search often directly concatenate
dense numerical features (e.g., document age) with embedded sparse features
(e.g., n-gram embeddings). In this paper, we first show with a set of
experiments on synthetic datasets that direct concatenation of dense and sparse
features does not lead to the optimal search performance of deep neural ranking
models. To effectively incorporate both sparse and dense email features into
personal email search ranking, we propose a novel neural model, SepAttn.
SepAttn first builds two separate neural models to learn from sparse and dense
features respectively, and then applies an attention mechanism at the
prediction level to derive the final prediction from these two models. We
conduct a comprehensive set of experiments on a large-scale email search
dataset, and demonstrate that our SepAttn model consistently improves the
search quality over the baseline models.Comment: WSDM 202