96,960 research outputs found
"May I borrow Your Filter?" Exchanging Filters to Combat Spam in a Community
Leveraging social networks in computer systems can be effective in dealing with a number of trust and security issues. Spam is one such issue where the "wisdom of crowds" can be harnessed by mining the collective knowledge of ordinary individuals. In this paper, we present a mechanism through which members of a virtual community can exchange information to combat spam. Previous attempts at collaborative spam filtering have concentrated on digest-based indexing techniques to share digests or fingerprints of emails that are known to be spam. We take a different approach and allow users to share their spam filters instead, thus dramatically reducing the amount of traffic generated in the network. The resultant diversity in the filters and cooperation in a community allows it to respond to spam in an autonomic fashion. As a test case for exchanging filters we use the popular SpamAssassin spam filtering software and show that exchanging spam filters provides an alternative method to improve spam filtering performance
Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering
User information needs vary significantly across different tasks, and
therefore their queries will also differ considerably in their expressiveness
and semantics. Many studies have been proposed to model such query diversity by
obtaining query types and building query-dependent ranking models. These
studies typically require either a labeled query dataset or clicks from
multiple users aggregated over the same document. These techniques, however,
are not applicable when manual query labeling is not viable, and aggregated
clicks are unavailable due to the private nature of the document collection,
e.g., in email search scenarios. In this paper, we study how to obtain query
type in an unsupervised fashion and how to incorporate this information into
query-dependent ranking models. We first develop a hierarchical clustering
algorithm based on truncated SVD and varimax rotation to obtain coarse-to-fine
query types. Then, we study three query-dependent ranking models, including two
neural models that leverage query type information as additional features, and
one novel multi-task neural model that views query type as the label for the
auxiliary query cluster prediction task. This multi-task model is trained to
simultaneously rank documents and predict query types. Our experiments on tens
of millions of real-world email search queries demonstrate that the proposed
multi-task model can significantly outperform the baseline neural ranking
models, which either do not incorporate query type information or just simply
feed query type as an additional feature.Comment: CIKM 201
- …