159 research outputs found
Text Classification for Authorship Attribution Using Naive Bayes Classifier with Limited Training Data
Authorship attribution (AA) is the task of identifying authors of disputed or anonymous texts. It can be seen as a single, multi-class text classification task. It is concerned with writing style rather than topic matter. The scalability issue in traditional AA studies concerns the effect of data size, the amount of data per candidate author. This has not been probed in much depth yet, since most stylometry researches tend to focus on long texts per author or multiple short texts, because stylistic choices frequently occur less in such short texts. This paper investigates the task of authorship attribution on short historical Arabic texts written by10 different authors. Several experiments are conducted on these texts by extracting various lexical and character features of the writing style of each author, using N-grams word level (1,2,3, and 4) and character level (1,2,3, and 4) grams as a text representation. Then Naive Bayes (NB) classifier is employed in order to classify the texts to their authors. This is to show robustness of NB classifier in doing AA on very short-sized texts when compared to Support Vector Machines (SVMs). Using dataset (called AAAT) which consists of 3 short texts per author’s book, it is shown our method is at least as effective as Information Gain (IG) for the selection of the most significant n-grams. Moreover, the significance of punctuation marks is explored in order to distinguish between authors, showing that an increase in the performance can be achieved. As well, the NB classifier achieved high accuracy results. Since the experiments of AA task that are done on AAAT dataset show interesting results with a classification accuracy of the best score obtained up to 96% using N-gram word level 1gram. Keywords: Authorship attribution, Text classification, Naive Bayes classifier, Character n-grams features, Word n-grams features
Recommended from our members
Ensemble methods for instance-based Arabic language authorship attribution
The Authorship Attribution (AA) is considered as a subfield of authorship analysis and it is an important problem as the range of anonymous information increased with fast growing of internet usage worldwide. In other languages such as English, Spanish and Chinese, such issue is quite well studied. However, in Arabic language, the AA problem has received less attention from the research community due to complexity and nature of Arabic sentences. The paper presented an intensive review on previous studies for Arabic language. Based on that, this study has employed the Technique for Order Preferences by Similarity to Ideal Solution (TOPSIS) method to choose the base classifier of the ensemble methods. In terms of attribution features, hundreds of stylometric features and distinct words using several tools have been extracted. Then, Adaboost and Bagging ensemble methods have been applied on Arabic enquires (Fatwa) dataset. The findings showed an improvement of the effectiveness of the authorship attribution task in the Arabic language
The Emergence of Multiple-Text Manuscripts
Selecting and excerpting, summarizing and canonizing, arranging texts and visual signs in manuscripts appear to be universal practices. This volume analyses the fascinating vicissitudes of birth and development, growth and decrease, of manuscripts consisting of more texts (‘multiple-text manuscripts’), at the example of a vast array of manuscript cultures, from the Indian, African, Christian, Islamic, and European domains
The Abode of Water: Shipwreck Evidence and the Maritime Circulation of Medicine Between Iran and China in the 9th Through 14th Centuries
This dissertation traces the role of Persian travelers and physicians in the maritime exchange of medical goods and knowledge between Iran and China between the ninth through the fourteenth centuries, and the afterlife of that exchange in modern museums. The Maritime Silk Road was a cosmopolitan network of premodern trade arteries linking the Far East and Southeast Asia to the Middle East by sea. The long-standing cultural and economic exchange across these thoroughfares dramatically expanded the pharmaceutical ingredients and medicinal recipes available to physicians practicing across the littorals of the Indian Ocean and South China Sea and facilitated the intellectual engagement of scholars with medical theories from afar. Drawing
from an archive of shipwreck artifacts that includes medical goods, herbs, trade wares, and the personal effects of seafarers interpreted alongside written accounts of sea travel, medical and philosophical texts, tomb inscriptions, and architecture in port cities, this dissertation explores
how the maritime journey of Persian travelers to China influenced the epistemology and practice of Persian medicine.
The first chapter addresses the current state of Southeast Asian shipwreck archaeology and traces the trajectories of medical, scientific, and related shipwreck and navigational artifacts within Western museum collections. Chapter two introduces the historical context in which Persian merchants moved in Middle Period China and initiates a discussion of hybridity and resilience. The burning and reconstruction of the Hangzhou Phoenix mosque provides the narrative frame in which repeated outbreaks of violence in Tang and Song port cities are discussed as an analog to theories of the body and disease. Migration, hybridization, and medical collecting are examined
as social and medical practices of resilience. The chapter uses archaeological evidence from port cities and the Belitung shipwreck with a narrative account of the massacre of foreign merchants in Guangzhou to situate the early maritime migration of Persian merchants to China within the changing tides of the Tang and Song periods. The third chapter analyzes the maritime trade routes as sites of spiritual and physical risk, humoral vulnerability, and initiation by examining Zoroastrian, Buddhist, and Islamic cosmologies of water and migration evident in religious
rituals, medical instructions for seafarers, and the personal effects and crew supplies recovered from the Belitung, Intan, Java Sea, and Pulau Buaya wrecks. These materials are interpreted in light of reflections on the maritime life by travelers who survived the journey to China, leaving
behind a ninth-century artistic depiction of a shipwreck, a narrative account, and inscriptions on the tombstones of merchants. Chapter four analyzes the medicines and medical material culture recovered from the Beliting, Java Sea, Intan, Pulau Buaya, and Quanzhou wreck sites within the
framework of Persian humoral medicine. The final chapter examines the Tansūqnāma, a fourteenth-century Persian translation of Chinese medical texts, in light of the longue durée of medical exchange between China and Iran and changes to social hierarchies throughout the Mongol Empire that drastically changed the position of Persian merchants in China.PHDAnthropology and HistoryUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163150/1/arespess_1.pd
Continuous User Authentication Using Multi-Modal Biometrics
It is commonly acknowledged that mobile devices now form an integral part of an individual’s everyday life. The modern mobile handheld devices are capable to provide a wide range of services and applications over multiple networks. With the increasing capability and accessibility, they introduce additional demands in term of security.
This thesis explores the need for authentication on mobile devices and proposes a novel mechanism to improve the current techniques. The research begins with an intensive review of mobile technologies and the current security challenges that mobile devices experience to illustrate the imperative of authentication on mobile devices. The research then highlights the existing authentication mechanism and a wide range of weakness. To this end, biometric approaches are identified as an appropriate solution an opportunity for security to be maintained beyond point-of-entry. Indeed, by utilising behaviour biometric techniques, the authentication mechanism can be performed in a continuous and transparent fashion.
This research investigated three behavioural biometric techniques based on SMS texting activities and messages, looking to apply these techniques as a multi-modal biometric authentication method for mobile devices. The results showed that linguistic profiling; keystroke dynamics and behaviour profiling can be used to discriminate users with overall Equal Error Rates (EER) 12.8%, 20.8% and 9.2% respectively. By using a combination of biometrics, the results showed clearly that the classification performance is better than using single biometric technique achieving EER 3.3%. Based on these findings, a novel architecture of multi-modal biometric authentication on mobile devices is proposed. The framework is able to provide a robust, continuous and transparent authentication in standalone and server-client modes regardless of mobile hardware configuration. The framework is able to continuously maintain the security status of the devices. With a high level of security status, users are permitted to access sensitive services and data. On the other hand, with the low level of security, users are required to re-authenticate before accessing sensitive service or data
- …