Search CORE

1,393,497 research outputs found

Text Classification Algorithms: A Survey

Author: Barnes Laura E.
Brown Donald E.
Heidarysafa Mojtaba
Kowsari Kamran
Meimandi Kiana Jafari
Mendu Sanjana
Publication venue: 'MDPI AG'
Publication date: 01/04/2019
Field of study

In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in the real-world problem are discussed

Multidisciplinary Digital Publishing Institute

arXiv.org e-Print Archive

Directory of Open Access Journals

Explicit Interaction Model towards Text Classification

Author: Chin Zhaozheng
Du Cunxiao
Feng Fuli
Gan Tian
Nie Liqiang
Zhu Lei
Publication venue
Publication date: 23/11/2018
Field of study

Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multi-label and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches.Comment: 8 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Weakly-Supervised Neural Text Classification

Author: Han Jiawei
Meng Yu
Shen Jiaming
Zhang Chao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/09/2018
Field of study

Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.Comment: CIKM 2018 Full Pape

arXiv.org e-Print Archive

Crossref