slides

Multi Faceted Text Classification using Supervised Machine Learning Models

Abstract

In recent year’s document management tasks (known as information retrieval) increased a lot due to availability of digital documents everywhere. The need of automatic methods for extracting document information became a prominent method for organizing information and knowledge discovery. Text Classification is one such solution, where in the natural language text is assigned to one or more predefined categories based on the content. In my research classification of text is mainly focused on sentiment label classification. The idea proposed for sentiment analysis is multi-class classification of online movie reviews. Many research papers discussed the classification of sentiment either positive or negative, but in this approach the user reviews are classified based on their sentiment to multi classes like positive, negative, neutral, very positive and very negative. This classification task would help the business to classify the user reviews same as star ratings, which are manually given by users. This paper also proposes a better classification approach with multi-tier prediction model. The goal of this research is to provide a better understanding classification for sentiment analysis by applying different preprocessing techniques and selecting suitable features like bag of words, stemming and removing stop words, POS Tagging etc. These features are adjusted to fit with some of the machine learning text classification algorithms such as Naïve Bayes, SVM, sand SGD on frameworks like WEKA, SVMLight & Scikit Learn

    Similar works