2 research outputs found
Improving Classification of Tweets Using Linguistic Information from a Large External Corpus
The bag of words representation of documents is often unsat-
isfactory as it ignores relationships between important terms that do not
co-occur literally. Improvements might be achieved by expanding the
vocabulary with other relevant word, like synonyms.
In this paper we use word-word co-occurence information from a large
corpus to expand the vocabulary of another corpus consisting of tweets.
Several different methods on how to include the co-occurence information
are constructed and tested out on the classification of real twitter data.
Our results show that we are able to reduce the number of erroneous
classifications by 14% using co-occurence information
Improving Classification of Tweets Using Linguistic Information from a Large External Corpus
The bag of words representation of documents is often unsat-
isfactory as it ignores relationships between important terms that do not
co-occur literally. Improvements might be achieved by expanding the
vocabulary with other relevant word, like synonyms.
In this paper we use word-word co-occurence information from a large
corpus to expand the vocabulary of another corpus consisting of tweets.
Several different methods on how to include the co-occurence information
are constructed and tested out on the classification of real twitter data.
Our results show that we are able to reduce the number of erroneous
classifications by 14% using co-occurence information