1,393,497 research outputs found
Text Classification Algorithms: A Survey
In recent years, there has been an exponential growth in the number of
complex documents and texts that require a deeper understanding of machine
learning methods to be able to accurately classify texts in many applications.
Many machine learning approaches have achieved surpassing results in natural
language processing. The success of these learning algorithms relies on their
capacity to understand complex models and non-linear relationships within data.
However, finding suitable structures, architectures, and techniques for text
classification is a challenge for researchers. In this paper, a brief overview
of text classification algorithms is discussed. This overview covers different
text feature extractions, dimensionality reduction methods, existing algorithms
and techniques, and evaluations methods. Finally, the limitations of each
technique and their application in the real-world problem are discussed
Explicit Interaction Model towards Text Classification
Text classification is one of the fundamental tasks in natural language
processing. Recently, deep neural networks have achieved promising performance
in the text classification task compared to shallow models. Despite of the
significance of deep models, they ignore the fine-grained (matching signals
between words and classes) classification clues since their classifications
mainly rely on the text-level representations. To address this problem, we
introduce the interaction mechanism to incorporate word-level matching signals
into the text classification task. In particular, we design a novel framework,
EXplicit interAction Model (dubbed as EXAM), equipped with the interaction
mechanism. We justified the proposed approach on several benchmark datasets
including both multi-label and multi-class text classification tasks. Extensive
experimental results demonstrate the superiority of the proposed method. As a
byproduct, we have released the codes and parameter settings to facilitate
other researches.Comment: 8 page
Weakly-Supervised Neural Text Classification
Deep neural networks are gaining increasing popularity for the classic text
classification task, due to their strong expressive power and less requirement
for feature engineering. Despite such attractiveness, neural text
classification models suffer from the lack of training data in many real-world
applications. Although many semi-supervised and weakly-supervised text
classification models exist, they cannot be easily applied to deep neural
models and meanwhile support limited supervision types. In this paper, we
propose a weakly-supervised method that addresses the lack of training data in
neural text classification. Our method consists of two modules: (1) a
pseudo-document generator that leverages seed information to generate
pseudo-labeled documents for model pre-training, and (2) a self-training module
that bootstraps on real unlabeled data for model refinement. Our method has the
flexibility to handle different types of weak supervision and can be easily
integrated into existing deep neural models for text classification. We have
performed extensive experiments on three real-world datasets from different
domains. The results demonstrate that our proposed method achieves inspiring
performance without requiring excessive training data and outperforms baseline
methods significantly.Comment: CIKM 2018 Full Pape
- …
