Journal of Data Mining and Management (e-ISSN: 2456-9437)
Abstract
Document Classification is one of the most important topic in Computer Science as the number of electronic documents are increasingly very rapidly each day. Document classification is also known as Document Categorization. Classification is training of known labels to predict the unknown labels. It is the process of assigning a particular document to predefined categories.In this paper, we apply machine learning methods for classification of Documents. Recurrent Neural Networks of which LSTM is one of the most successful and have been developed for Controlling Robots, Natural Language Text Compression, Automatic Speech Recognition, Time Series Prediction, Handwriting Recognition and many more. LSTM can also be used for document classification. Document Classification includes text processing, feature extraction, feature vector construction and label prediction or final classification. Furthermore, we first try some data processing on 20 Newsgroup Dataset, and then we extract a features by using feature weighting and feature selection algorithms. The extracted features are then passed to the LSTM Neural network for future Label Predictions. Therefore, the documents are classified into different categories according to their context