3 research outputs found
Reply With: Proactive Recommendation of Email Attachments
Email responses often contain items-such as a file or a hyperlink to an
external document-that are attached to or included inline in the body of the
message. Analysis of an enterprise email corpus reveals that 35% of the time
when users include these items as part of their response, the attachable item
is already present in their inbox or sent folder. A modern email client can
proactively retrieve relevant attachable items from the user's past emails
based on the context of the current conversation, and recommend them for
inclusion, to reduce the time and effort involved in composing the response. In
this paper, we propose a weakly supervised learning framework for recommending
attachable items to the user. As email search systems are commonly available,
we constrain the recommendation task to formulating effective search queries
from the context of the conversations. The query is submitted to an existing IR
system to retrieve relevant items for attachment. We also present a novel
strategy for generating labels from an email corpus---without the need for
manual annotations---that can be used to train and evaluate the query
formulation model. In addition, we describe a deep convolutional neural network
that demonstrates satisfactory performance on this query formulation task when
evaluated on the publicly available Avocado dataset and a proprietary dataset
of internal emails obtained through an employee participation program.Comment: CIKM2017. Proceedings of the 26th ACM International Conference on
Information and Knowledge Management. 201
Improving Quality of Training Data for Learning to Rank Using Click-Through Data
In information retrieval, relevance of documents with respect to queries is usually judged by humans, and used in evaluation and/or learning of ranking functions. Previous work has shown that certain level of noise in relevance judgments has little effect on evaluation, especially for comparison purposes. Recently learning to rank has become one of the major means to create ranking models in which the models are automatically learned from the data derived from a large number of relevance judgments. As far as we know, there was no previous work about quality of training data for learning to rank, and this paper tries to study the issue. Specifically, we address three problems. Firstly, we show that the quality of training data labeled by humans has critical impact on the performance of learning to rank algorithms. Secondly, we propose detecting relevance judgment errors using click-through data accumulated at a search engine. Two discriminative models, referred to as sequential dependency model and full dependency model, are proposed to make the detection. Both models consider the conditional dependency of relevance labels and thus are more powerful than the conditionally independent model previously proposed for other tasks. Finally, we verify that using training data in which the errors are detected and corrected by our method we can improve the performance of learning to rank algorithms