2 research outputs found
Recommended from our members
Ensemble methods for instance-based Arabic language authorship attribution
The Authorship Attribution (AA) is considered as a subfield of authorship analysis and it is an important problem as the range of anonymous information increased with fast growing of internet usage worldwide. In other languages such as English, Spanish and Chinese, such issue is quite well studied. However, in Arabic language, the AA problem has received less attention from the research community due to complexity and nature of Arabic sentences. The paper presented an intensive review on previous studies for Arabic language. Based on that, this study has employed the Technique for Order Preferences by Similarity to Ideal Solution (TOPSIS) method to choose the base classifier of the ensemble methods. In terms of attribution features, hundreds of stylometric features and distinct words using several tools have been extracted. Then, Adaboost and Bagging ensemble methods have been applied on Arabic enquires (Fatwa) dataset. The findings showed an improvement of the effectiveness of the authorship attribution task in the Arabic language
Deep Learning-based Method for Enhancing the Detection of Arabic Authorship Attribution using Acoustic and Textual-based Features
Authorship attribution (AA) is defined as the identification of the original author of an unseen text. It is found that the style of the author’s writing can change from one topic to another, but the author’s habits are still the same in different texts. The authorship attribution has been extensively studied for texts written in different languages such as English. However, few studies investigated the Arabic authorship attribution (AAA) due to the special challenges faced with the Arabic scripts. Additionally, there is a need to identify the authors of texts extracted from livestream broadcasting and the recorded speeches to protect the intellectual property of these authors. This paper aims to enhance the detection of Arabic authorship attribution by extracting different features and fusing the outputs of two deep learning models. The dataset used in this study was collected from the weekly livestream and recorded Arabic sermons that are available publicly on the official website of Al-Haramain in Saudi Arabia. The acoustic, textual and stylometric features were extracted for five authors. Then, the data were pre-processed and fed into the deep learning-based models (CNN architecture and its pre-trained ResNet34). After that the hard and soft voting ensemble methods were applied for combining the outputs of the applied models and improve the overall performance. The experimental results showed that the use of CNN with textual data obtained an acceptable performance using all evaluation metrics. Then, the performance of ResNet34 model with acoustic features outperformed the other models and obtained the accuracy of 90.34%. Finally, the results showed that the soft voting ensemble method enhanced the performance of AAA and outperformed the other method in terms of accuracy and precision, which obtained 93.19% and 0.9311 respectively