research

Enhancing a Portuguese text classifier using part-of-speech tags

Abstract

Support Vector Machines have been applied to text classification with great success. In this paper, we apply and evaluate the impact of using part-of- speech tags (nouns, proper nouns, adjectives and verbs) as a feature selection procedure in a European Portuguese written dataset – the Portuguese Attorney General’s Office documents. From the results, we can conclude that verbs alone don’t have enough informa- tion to produce good learners. On the other hand, we obtain learners with equiva- lent performance and a reduced number of features (at least half) if we use specific part-of-speech tags instead of all words

    Similar works