Search CORE

4,309 research outputs found

Recommended from our members

Ensemble methods for instance-based Arabic language authorship attribution

Author: Al-Hadhrami T
Al-Sarem M
Alsaeedi A
Boulila W
Saeed F
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/01/2020
Field of study

The Authorship Attribution (AA) is considered as a subfield of authorship analysis and it is an important problem as the range of anonymous information increased with fast growing of internet usage worldwide. In other languages such as English, Spanish and Chinese, such issue is quite well studied. However, in Arabic language, the AA problem has received less attention from the research community due to complexity and nature of Arabic sentences. The paper presented an intensive review on previous studies for Arabic language. Based on that, this study has employed the Technique for Order Preferences by Similarity to Ideal Solution (TOPSIS) method to choose the base classifier of the ensemble methods. In terms of attribution features, hundreds of stylometric features and distinct words using several tools have been extracted. Then, Adaboost and Bagging ensemble methods have been applied on Arabic enquires (Fatwa) dataset. The findings showed an improvement of the effectiveness of the authorship attribution task in the Arabic language

Nottingham Trent Institutional Repository (IRep)

Letter counting: a stem cell for Cryptology, Quantitative Linguistics, and Statistics

Author: Ycart Bernard
Publication venue
Publication date: 29/11/2012
Field of study

Counting letters in written texts is a very ancient practice. It has accompanied the development of Cryptology, Quantitative Linguistics, and Statistics. In Cryptology, counting frequencies of the different characters in an encrypted message is the basis of the so called frequency analysis method. In Quantitative Linguistics, the proportion of vowels to consonants in different languages was studied long before authorship attribution. In Statistics, the alternation vowel-consonants was the only example that Markov ever gave of his theory of chained events. A short history of letter counting is presented. The three domains, Cryptology, Quantitative Linguistics, and Statistics, are then examined, focusing on the interactions with the other two fields through letter counting. As a conclusion, the eclectism of past centuries scholars, their background in humanities, and their familiarity with cryptograms, are identified as contributing factors to the mutual enrichment process which is described here

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Natural language processing and cognitive science : proceedings 2018

Author: Lubaszewski Wiesław
Sedes Florence
Sharp Bernadette
Publication venue: Jagiellonian Library
Publication date: 01/01/2018
Field of study

Jagiellonian Univeristy Repository

The ‘PAThs’ Project: An Effort to Represent the Physical Dimension of Coptic Literary Production (Third–Eleventh centuries)

Author: F. Berno
J. Bogdani
P. Buzi
Publication venue: place:Hamburg
Publication date: 01/01/2018
Field of study

PAThs – Tracking Papyrus and Parchment Paths: An Archaeological Atlas of Coptic Literature. Literary Texts in their Geographical Context. Production, Copying, Usage, Dissemination and Storage is an ambitious digital project based in Rome, working towards a new historical and archaeological geography of the Coptic literary tradition. This aim implies a number of auxiliary tasks and challenges, including classification of authors, works, titles, colophons, and codicological units, as well as the study and wherever possible exact mapping of the relevant geographical sites related to the production, circulation, and storage of manuscript

Archivio della ricerca- Università di Roma La Sapienza

Categorisation of Arabic Twitter Text

Author: Altamimi Mohammed Hamed R
Publication venue
Publication date: 26/02/2020
Field of study

Bangor University Research Portal

Conditional Complexity of Compression for Authorship Attribution

Author: Chammi I. Wickramasinghe
Mikhail B. Malyutov
Sufeng Li
Publication venue
Publication date
Field of study

We introduce new stylometry tools based on the sliced conditional compression complexity of literary texts which are inspired by the nearly optimal application of the incomputable Kolmogorov conditional complexity (and presumably approximates it). Whereas other stylometry tools can occasionally be very close for different authors, our statistic is apparently strictly minimal for the true author, if the query and training texts are sufficiently large, compressor is sufficiently good and sampling bias is avoided (as in the poll samplings). We tune it and test its performance on attributing the Federalist papers (Madison vs. Hamilton). Our results confirm the previous attribution of Federalist papers by Mosteller and Wallace (1964) to Madison using the Naive Bayes classifier and the same attribution based on alternative classifiers such as SVM, and the second order Markov model of language. Then we apply our method for studying the attribution of the early poems from the Shakespeare Canon and the continuation of Marlowe’s poem ‘Hero and Leander’ ascribed to G. Chapman.compression complexity, authorship attribution.

Research Papers in Economics

Investigating features and techniques for Arabic authoriship attribution

Author: Shaker Kareem
Publication venue: 'University of Debrecen/ Debreceni Egyetem'
Publication date: 01/08/2012
Field of study

Authorship attribution is the problem of identifying the true author of a disputed text. Throughout history, there have been many examples of this problem concerned with revealing genuine authors of works of literature that were published anonymously, and in some cases where more than one author claimed authorship of the disputed text. There has been considerable research effort into trying to solve this problem. Initially these efforts were based on statistical patterns, and more recently they have centred on a range of techniques from artificial intelligence. An important early breakthrough was achieved by Mosteller and Wallace in 1964 [15], who pioneered the use of ‘function words’ – typically pronouns, conjunctions and prepositions – as the features on which to base the discovery of patterns of usage relevant to specific authors. The authorship attribution problem has been tackled in many languages, but predominantly in the English language. In this thesis the problem is addressed for the first time in the Arabic Language. We therefore investigate whether the concept of functions words in English can also be used in the same way for authorship attribution in Arabic. We also describe and evaluate a hybrid of evolutionary algorithms and linear discriminant analysis as an approach to learn a model that classifies the author of a text, based on features derived from Arabic function words. The main target of the hybrid algorithm is to find a subset of features that can robustly and accurately classify disputed texts in unseen data. The hybrid algorithm also aims to do this with relatively small subsets of features. A specialised dataset was produced for this work, based on a collection of 14 Arabic books of different natures, representing a collection of six authors. This dataset was processed into training and test partitions in a way that provides a diverse collection of challenges for any authorship attribution approach. The combination of the successful list of Arabic function words and the hybrid algorithm for classification led to satisfying levels of accuracy in determining the author of portions of the texts in test data. The work described here is the first (to our knowledge) that investigates authorship attribution in the Arabic knowledge using computational methods. Among its contributions are: the first set of Arabic function words, the first specialised dataset aimed at testing Arabic authorship attribution methods, a new hybrid algorithm for classifying authors based on patterns derived from these function words, and, finally, a number of ideas and variants regarding how to use function words in association with character level features, leading in some cases to more accurate results

ROS: The Research Output Service. Heriot-Watt University Edinburgh