3 research outputs found
An enhanced gated recurrent unit with auto-encoder for solving text classification problems
Classification has become an important task for categorizing documents
automatically based on their respective groups. Gated Recurrent Unit (GRU) is a
type of Recurrent Neural Networks (RNNs), and a deep learning algorithm that
contains update gate and reset gate. It is considered as one of the most efficient text
classification techniques, specifically on sequential datasets. However, GRU suffered
from three major issues when it is applied for solving the text classification
problems. The first drawback is the failure in data dimensionality reduction, which
leads to low quality solution for the classification problems. Secondly, GRU still has
difficulty in training procedure due to redundancy between update and reset gates.
The reset gate creates complexity and require high processing time. Thirdly, GRU
also has a problem with informative features loss in each recurrence during the
training phase and high computational cost. The reason behind this failure is due to a
random selection of features from datasets (or previous outputs), when applied in its
standard form. Therefore, in this research, a new model namely Encoder Simplified
GRU (ES-GRU) is proposed to reduce dimension of data using an Auto-Encoder
(AE). Accordingly, the reset gate is replaced with an update gate in order to reduce
the redundancy and complexity in the standard GRU. Finally, a Batch Normalization
method is incorporated in the GRU and AE for improving the performance of the
proposed ES-GRU model. The proposed model has been evaluated on seven
benchmark text datasets and compared with six baselines well-known multiclass text
classification approaches included standard GRU, AE, Long Short Term Memory,
Convolutional Neural Network, Support Vector Machine, and Naïve Bayes. Based
on various types of performance evaluation parameters, a considerable amount of
improvement has been observed in the performance of the proposed model as
compared to other standard classification techniques, and showed better effectiveness
and efficiency of the developed model
Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification
Classification has become an important task for automatically classifying the documents to their respective categories. For text classification, feature selection techniques are normally used to identify important features and to remove irrelevant, and noisy features for minimizing the dimensionality of feature space. These techniques are expected particularly to improve efficiency, accuracy, and comprehensibility of the classification models in text labeling problems. Most of the feature selection techniques utilize document and term frequencies to rank a term. Existing feature selection techniques (e.g. RDC, NRDC) consider frequently occurring terms and ignore rarely occurring terms count in a class. However, this study proposes the Improved Relative Discriminative Criterion (IRDC) technique which considers rarely occurring terms count. It is argued that rarely occurring terms count are also meaningful and important as frequently occurring terms in a class. The proposed IRDC is compared to the most recent feature selection techniques RDC and NRDC. The results reveal significant improvement by the proposed IRDC technique for feature selection in terms of precision 27%, recall 30%, macro-average 35% and micro- average 30%. Additionally, this study also proposes a hybrid algorithm named: Ringed Seal Search-Support Vector Machine (RSS-SVM) to improve the generalization and learning capability of the SVM. The proposed RSS-SVM optimizes kernel and penalty parameter with the help of RSS algorithm. The proposed RSS-SVM is compared to the most recent techniques GA-SVM and CS-SVM. The results show significant improvement by the proposed RSS-SVM for classification in terms of accuracy 18.8%, recall 15.68%, precision 15.62% and specificity 13.69%. In conclusion, the proposed IRDC has shown better performance as compare to existing techniques because its capability in considering rare and informative terms. Additionally, the proposed RSS- SVM has shown better performance as compare to existing techniques because it has capability to improve balance between exploration and exploitation