Search CORE

884 research outputs found

Authorship Identification in Bengali Literature: a Comparative Analysis

Author: Chakraborty Tanmoy
Publication venue
Publication date: 01/01/2012
Field of study

Stylometry is the study of the unique linguistic styles and writing behaviors of individuals. It belongs to the core task of text categorization like authorship identification, plagiarism detection etc. Though reasonable number of studies have been conducted in English language, no major work has been done so far in Bengali. In this work, We will present a demonstration of authorship identification of the documents written in Bengali. We adopt a set of fine-grained stylistic features for the analysis of the text and use them to develop two different models: statistical similarity model consisting of three measures and their combination, and machine learning model with Decision Tree, Neural Network and SVM. Experimental results show that SVM outperforms other state-of-the-art methods after 10-fold cross validations. We also validate the relative importance of each stylistic feature to show that some of them remain consistently significant in every model used in this experiment.Comment: 9 pages, 5 tables, 4 picture

arXiv.org e-Print Archive

CiteSeerX

Authorship Classification in a Resource Constraint Language Using Convolutional Neural Networks

Author: Dewan M. Ali Akber
Hoque Mohammed Moshiul
Hossain Md. Rajib
Islam Md. Nazmul
Sarker Iqbal H.
Siddique Nazmul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Authorship classification is a method of automatically determining the appropriate author of an unknown linguistic text. Although research on authorship classification has significantly progressed in high-resource languages, it is at a primitive stage in the realm of resource-constraint languages like Bengali. This paper presents an authorship classification approach made of Convolution Neural Networks (CNN) comprising four modules: embedding model generation, feature representation, classifier training and classifier testing. For this purpose, this work develops a new embedding corpus (named WEC) and a Bengali authorship classification corpus (called BACC-18), which are more robust in terms of authors’ classes and unique words. Using three text embedding techniques (Word2Vec, GloVe and FastText) and combinations of different hyperparameters, 90 embedding models are created in this study. All the embedding models are assessed by intrinsic evaluators and those selected are the 9 best performing models out of 90 for the authorship classification. In total 36 classification models, including four classification models (CNN, LSTM, SVM, SGD) and three embedding techniques with 100, 200 and 250 embedding dimensions, are trained with optimized hyperparameters and tested on three benchmark datasets (BACC-18, BAAD16 and LD). Among the models, the optimized CNN with GloVe model achieved the highest classification accuracies of 93.45%, 95.02%, and 98.67% for the datasets BACC-18, BAAD16, and LD, respectively

Directory of Open Access Journals

Ulster University's Research Portal

The Word2vec Graph Model for Author Attribution and Genre Detection in Literary Analysis

Author: Ali Mohammed Eunus
Tripto Nafis Irtiza
Publication venue
Publication date: 25/10/2023
Field of study

Analyzing the writing styles of authors and articles is a key to supporting various literary analyses such as author attribution and genre detection. Over the years, rich sets of features that include stylometry, bag-of-words, n-grams have been widely used to perform such analysis. However, the effectiveness of these features largely depends on the linguistic aspects of a particular language and datasets specific characteristics. Consequently, techniques based on these feature sets cannot give desired results across domains. In this paper, we propose a novel Word2vec graph based modeling of a document that can rightly capture both context and style of the document. By using these Word2vec graph based features, we perform classification to perform author attribution and genre detection tasks. Our detailed experimental study with a comprehensive set of literary writings shows the effectiveness of this method over traditional feature based approaches. Our code and data are publicly available at https://cutt.ly/svLjSgkComment: 12 pages, 6 figure

arXiv.org e-Print Archive

Author Identification from Literary Articles with Visual Features: A Case Study with Bangla Documents

Author: Biswas Amitabha
Dhar Ankita
Mukherjee Himadri
Roy Kaushik
Sen Shibaprasad
Sk Md Obaidullah
Teresa Gonçalves
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Author identification is an important aspect of literary analysis, studied in natural language processing (NLP). It aids identify the most probable author of articles, news texts or social media comments and tweets, for example. It can be applied to other domains such as criminal and civil cases, cybersecurity, forensics, identification of plagiarizer, and many more. An automated system in this context can thus be very beneficial for society. In this paper, we propose a convolutional neural network (CNN)-based author identification system from literary articles. This system uses visual features along with a five-layer convolutional neural network for the identification of authors. The prime motivation behind this approach was the feasibility to identify distinct writing styles through a visualization of the writing patterns. Experiments were performed on 1200 articles from 50 authors achieving a maximum accuracy of 93.58%. Furthermore, to see how the system performed on different volumes of data, the experiments were performed on partitions of the dataset. The system outperformed standard handcrafted feature-based techniques as well as established works on publicly available datasets

Directory of Open Access Journals

Repositório Científico da Universidade de Évora

Resolving the confusion of the authorship attribution of a Bengali book

Author: Jana Siladitya
Mazumder Satyaki
Publication venue: CSIR-National Institute of Science Communication and Policy Research (NIScPR)
Publication date: 15/12/2021
Field of study

The present paper aims to determine whether the Bengali book Londoner Naksa ebong France Bhraman (Wondrous Capers at London and Travelling in France) was written by the geologist Pramathanath Bose (P.N. Bose). To find it out, two well-established style markers often used in authorship attribution studies; namely, function words and punctuation marks, are used here. The result shows that possibly this book was penned by the geologist P.N. Bose. As a corollary, it may also be added that this approach may be used in future authorship attribution studies involving Bengali writings

Online Publishing @ NISCAIR

Resolving the confusion of the authorship attribution of a Bengali book

Author: Jana Siladitya
Mazumder Satyaki
Publication venue: NIScPR-CSIR, India
Publication date: 01/12/2021
Field of study

406-410The present paper aims to determine whether the Bengali book Londoner Naksa ebong France Bhraman (Wondrous Capers at London and Travelling in France) was written by the geologist Pramathanath Bose (P.N. Bose). To find it out, two well-established style markers often used in authorship attribution studies; namely, function words and punctuation marks, are used here. The result shows that possibly this book was penned by the geologist P.N. Bose. As a corollary, it may also be added that this approach may be used in future authorship attribution studies involving Bengali writings

NOPR