5,936 research outputs found
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis
Text preprocessing is often the first step in the pipeline of a Natural
Language Processing (NLP) system, with potential impact in its final
performance. Despite its importance, text preprocessing has not received much
attention in the deep learning literature. In this paper we investigate the
impact of simple text preprocessing decisions (particularly tokenizing,
lemmatizing, lowercasing and multiword grouping) on the performance of a
standard neural text classifier. We perform an extensive evaluation on standard
benchmarks from text categorization and sentiment analysis. While our
experiments show that a simple tokenization of input text is generally
adequate, they also highlight significant degrees of variability across
preprocessing techniques. This reveals the importance of paying attention to
this usually-overlooked step in the pipeline, particularly when comparing
different models. Finally, our evaluation provides insights into the best
preprocessing practices for training word embeddings.Comment: Blackbox EMNLP 2018. 7 page
Character-level Convolutional Networks for Text Classification
This article offers an empirical exploration on the use of character-level
convolutional networks (ConvNets) for text classification. We constructed
several large-scale datasets to show that character-level convolutional
networks could achieve state-of-the-art or competitive results. Comparisons are
offered against traditional models such as bag of words, n-grams and their
TFIDF variants, and deep learning models such as word-based ConvNets and
recurrent neural networks.Comment: An early version of this work entitled "Text Understanding from
Scratch" was posted in Feb 2015 as arXiv:1502.01710. The present paper has
considerably more experimental results and a rewritten introduction, Advances
in Neural Information Processing Systems 28 (NIPS 2015
The Role of Text Pre-processing in Sentiment Analysis
It is challenging to understand the latest trends and summarise the state or general opinions about products due to the big diversity and size of social media data, and this creates the need of automated and real time opinion extraction and mining. Mining online opinion is a form of sentiment analysis that is treated as a difficult text classification task. In this paper, we explore the role of text pre-processing in sentiment analysis, and report on experimental results that demonstrate that with appropriate feature selection and representation, sentiment analysis accuracies using support vector machines (SVM) in this area may be significantly improved. The level of accuracy achieved is shown to be comparable to the ones achieved in topic categorisation although sentiment analysis is considered to be a much harder problem in the literature
- …