4 research outputs found
Text classification of traditional and national songs using naïve bayes algorithm
In this research, we investigate the effectiveness of the multinomial Naïve Bayes algorithm in the context of text classification, with a particular focus on distinguishing between folk songs and national songs. The rationale for choosing the Naïve Bayes method lies in its unique ability to evaluate word frequencies not only within individual documents but across the entire dataset, leading to significant improvements in accuracy and stability. Our dataset includes 480 folk songs and 90 national songs, categorized into six distinct scenarios, encompassing two, four, and 31 labels, with and without the application of Synthetic Minority Over-sampling Technique (SMOTE). The research journey involves several essential stages, beginning with pre-processing tasks such as case folding, punctuation removal, tokenization, and TF-IDF transformation. Subsequently, the text classification is executed using the multinomial Naïve Bayes algorithm, followed by rigorous testing through k-fold cross-validation and SMOTE resampling techniques. Notably, our findings reveal that the most favorable scenario unfolds when SMOTE is applied to two labels, resulting in a remarkable accuracy rate of 93.75%. These findings underscore the prowess of the multinomial Naïve Bayes algorithm in effectively classifying small data label categories
Imbalanced Data Learning by Minority Class Augmentation Using Capsule Adversarial Networks
The fact that image datasets are often imbalanced poses an intense challenge
for deep learning techniques. In this paper, we propose a method to restore the
balance in imbalanced images, by coalescing two concurrent methods, generative
adversarial networks (GANs) and capsule network. In our model, generative and
discriminative networks play a novel competitive game, in which the generator
generates samples towards specific classes from multivariate probabilities
distribution. The discriminator of our model is designed in a way that while
recognizing the real and fake samples, it is also requires to assign classes to
the inputs. Since GAN approaches require fully observed data during training,
when the training samples are imbalanced, the approaches might generate similar
samples which leading to data overfitting. This problem is addressed by
providing all the available information from both the class components jointly
in the adversarial training. It improves learning from imbalanced data by
incorporating the majority distribution structure in the generation of new
minority samples. Furthermore, the generator is trained with feature matching
loss function to improve the training convergence. In addition, prevents
generation of outliers and does not affect majority class space. The
evaluations show the effectiveness of our proposed methodology; in particular,
the coalescing of capsule-GAN is effective at recognizing highly overlapping
classes with much fewer parameters compared with the convolutional-GAN