224 research outputs found
Weakly-Supervised Neural Text Classification
Deep neural networks are gaining increasing popularity for the classic text
classification task, due to their strong expressive power and less requirement
for feature engineering. Despite such attractiveness, neural text
classification models suffer from the lack of training data in many real-world
applications. Although many semi-supervised and weakly-supervised text
classification models exist, they cannot be easily applied to deep neural
models and meanwhile support limited supervision types. In this paper, we
propose a weakly-supervised method that addresses the lack of training data in
neural text classification. Our method consists of two modules: (1) a
pseudo-document generator that leverages seed information to generate
pseudo-labeled documents for model pre-training, and (2) a self-training module
that bootstraps on real unlabeled data for model refinement. Our method has the
flexibility to handle different types of weak supervision and can be easily
integrated into existing deep neural models for text classification. We have
performed extensive experiments on three real-world datasets from different
domains. The results demonstrate that our proposed method achieves inspiring
performance without requiring excessive training data and outperforms baseline
methods significantly.Comment: CIKM 2018 Full Pape
Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering
User information needs vary significantly across different tasks, and
therefore their queries will also differ considerably in their expressiveness
and semantics. Many studies have been proposed to model such query diversity by
obtaining query types and building query-dependent ranking models. These
studies typically require either a labeled query dataset or clicks from
multiple users aggregated over the same document. These techniques, however,
are not applicable when manual query labeling is not viable, and aggregated
clicks are unavailable due to the private nature of the document collection,
e.g., in email search scenarios. In this paper, we study how to obtain query
type in an unsupervised fashion and how to incorporate this information into
query-dependent ranking models. We first develop a hierarchical clustering
algorithm based on truncated SVD and varimax rotation to obtain coarse-to-fine
query types. Then, we study three query-dependent ranking models, including two
neural models that leverage query type information as additional features, and
one novel multi-task neural model that views query type as the label for the
auxiliary query cluster prediction task. This multi-task model is trained to
simultaneously rank documents and predict query types. Our experiments on tens
of millions of real-world email search queries demonstrate that the proposed
multi-task model can significantly outperform the baseline neural ranking
models, which either do not incorporate query type information or just simply
feed query type as an additional feature.Comment: CIKM 201
Mining Entity Synonyms with Efficient Neural Set Generation
Mining entity synonym sets (i.e., sets of terms referring to the same entity)
is an important task for many entity-leveraging applications. Previous work
either rank terms based on their similarity to a given query term, or treats
the problem as a two-phase task (i.e., detecting synonymy pairs, followed by
organizing these pairs into synonym sets). However, these approaches fail to
model the holistic semantics of a set and suffer from the error propagation
issue. Here we propose a new framework, named SynSetMine, that efficiently
generates entity synonym sets from a given vocabulary, using example sets from
external knowledge bases as distant supervision. SynSetMine consists of two
novel modules: (1) a set-instance classifier that jointly learns how to
represent a permutation invariant synonym set and whether to include a new
instance (i.e., a term) into the set, and (2) a set generation algorithm that
enumerates the vocabulary only once and applies the learned set-instance
classifier to detect all entity synonym sets in it. Experiments on three real
datasets from different domains demonstrate both effectiveness and efficiency
of SynSetMine for mining entity synonym sets.Comment: AAAI 2019 camera-ready versio
- …