Search CORE

26,399 research outputs found

Drawing Elena Ferrante's Profile. Workshop Proceedings, Padova, 7 September 2017

Author: Arjuna Tuzzi
Michele A. Cortelazzo
Publication venue: place:Padova
Publication date: 01/01/2018
Field of study

Elena Ferrante is an internationally acclaimed Italian novelist whose real identity has been kept secret by E/O publishing house for more than 25 years. Owing to her popularity, major Italian and foreign newspapers have long tried to discover her real identity. However, only a few attempts have been made to foster a scientific debate on her work. In 2016, Arjuna Tuzzi and Michele Cortelazzo led an Italian research team that conducted a preliminary study and collected a well-founded, large corpus of Italian novels comprising 150 works published in the last 30 years by 40 different authors. Moreover, they shared their data with a select group of international experts on authorship attribution, profiling, and analysis of textual data: Maciej Eder and Jan Rybicki (Poland), Patrick Juola (United States), Vittorio Loreto and his research team, Margherita Lalli and Francesca Tria (Italy), George Mikros (Greece), Pierre Ratinaud (France), and Jacques Savoy (Switzerland). The chapters of this volume report the results of this endeavour that were first presented during the international workshop Drawing Elena Ferrante's Profile in Padua on 7 September 2017 as part of the 3rd IQLA-GIAT Summer School in Quantitative Analysis of Textual Data. The fascinating research findings suggest that Elena Ferrante\u2019s work definitely deserves \u201cmany hands\u201d as well as an extensive effort to understand her distinct writing style and the reasons for her worldwide success

Archivio istituzionale della ricerca - Università di Padova

Quantification and the language of later Shakespeare

Author: Hope Jonathan
Witmore Michael
Publication venue: Société Française Shakespeare
Publication date: 01/01/2014
Field of study

In this paper we consider the status of quantitative evidence in literary studies, with an example from our own work using the software package Docuscope to investigate chronological ‘periods’ in Shakespeare’s career. We argue that quantitative evidence has a function in literary studies, not as an end in itself, but as a starting point for traditional interpretative literary analysis. In our example, we show that linguistic analysis suggests three periods in Shakespeare’s career, defining a ‘period’ as a group of plays with similar linguistic features. We focus on the latest period, as this is the largest, and suggest that the ‘late style’ of Shakespeare may begin much earlier than traditionally thought. We analyse the features that the later plays share, and argue that from the late 1590s Shakespeare can be seen to be adopting features which are (a) closer to speech, and (b) indicate a shift from real-world denotation to a focus on communicating the subjectivity of the speaker

University of Strathclyde Institutional Repository

OpenEdition

CEAI: CCM based Email Authorship Identification Model

Author: Memon Nasrullah
Nizamani Sarwat
Publication venue
Publication date: 13/11/2013
Field of study

In this paper we present a model for email authorship identification (EAI) by employing a Cluster-based Classification (CCM) technique. Traditionally, stylometric features have been successfully employed in various authorship analysis tasks; we extend the traditional feature-set to include some more interesting and effective features for email authorship identification (e.g. the last punctuation mark used in an email, the tendency of an author to use capitalization at the start of an email, or the punctuation after a greeting or farewell). We also included Info Gain feature selection based content features. It is observed that the use of such features in the authorship identification process has a positive impact on the accuracy of the authorship identification task. We performed experiments to justify our arguments and compared the results with other base line models. Experimental results reveal that the proposed CCM-based email authorship identification model, along with the proposed feature set, outperforms the state-of-the-art support vector machine (SVM)-based models, as well as the models proposed by Iqbal et al. [1, 2]. The proposed model attains an accuracy rate of 94% for 10 authors, 89% for 25 authors, and 81% for 50 authors, respectively on Enron dataset, while 89.5% accuracy has been achieved on authors' constructed real email dataset. The results on Enron dataset have been achieved on quite a large number of authors as compared to the models proposed by Iqbal et al. [1, 2]

arXiv.org e-Print Archive

Directory of Open Access Journals

University of Southern Denmark Research Output

Probing the topological properties of complex networks modeling short written texts

Author: Amancio Diego R.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 29/12/2014
Field of study

In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat texts as networks has simply considered either large pieces of texts or entire books. This approach has certainly worked well -- many informative discoveries have been made this way -- but it raises an uncomfortable question: could there be important topological patterns in small pieces of texts? To address this problem, the topological properties of subtexts sampled from entire books was probed. Statistical analyzes performed on a dataset comprising 50 novels revealed that most of the traditional topological measurements are stable for short subtexts. When the performance of the authorship recognition task was analyzed, it was found that a proper sampling yields a discriminability similar to the one found with full texts. Surprisingly, the support vector machine classification based on the characterization of short texts outperformed the one performed with entire books. These findings suggest that a local topological analysis of large documents might improve its global characterization. Most importantly, it was verified, as a proof of principle, that short texts can be analyzed with the methods and concepts of complex networks. As a consequence, the techniques described here can be extended in a straightforward fashion to analyze texts as time-varying complex networks

arXiv.org e-Print Archive

Public Library of Science (PLOS)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Universidade de São Paulo

FigShare

Text authorship identified using the dynamics of word co-occurrence networks

Author: Akimushkin Camilo
Amancio Diego R.
Oliveira Jr Osvaldo N.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 29/07/2016
Field of study

The identification of authorship in disputed documents still requires human expertise, which is now unfeasible for many tasks owing to the large volumes of text and authors in practical applications. In this study, we introduce a methodology based on the dynamics of word co-occurrence networks representing written texts to classify a corpus of 80 texts by 8 authors. The texts were divided into sections with equal number of linguistic tokens, from which time series were created for 12 topological metrics. The series were proven to be stationary (p-value>0.05), which permits to use distribution moments as learning attributes. With an optimized supervised learning procedure using a Radial Basis Function Network, 68 out of 80 texts were correctly classified, i.e. a remarkable 85% author matching success rate. Therefore, fluctuations in purely dynamic network metrics were found to characterize authorship, thus opening the way for the description of texts in terms of small evolving networks. Moreover, the approach introduced allows for comparison of texts with diverse characteristics in a simple, fast fashion

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

FigShare

DeepAPT: Nation-State APT Attribution Using End-to-End Deep Neural Networks

Author: E Stamatatos
JD Olden
MD Zeiler
N Rosenblum
N Srivastava
R Collobert
S Alrabaee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/11/2017
Field of study

In recent years numerous advanced malware, aka advanced persistent threats (APT) are allegedly developed by nation-states. The task of attributing an APT to a specific nation-state is extremely challenging for several reasons. Each nation-state has usually more than a single cyber unit that develops such advanced malware, rendering traditional authorship attribution algorithms useless. Furthermore, those APTs use state-of-the-art evasion techniques, making feature extraction challenging. Finally, the dataset of such available APTs is extremely small. In this paper we describe how deep neural networks (DNN) could be successfully employed for nation-state APT attribution. We use sandbox reports (recording the behavior of the APT when run dynamically) as raw input for the neural network, allowing the DNN to learn high level feature abstractions of the APTs itself. Using a test set of 1,000 Chinese and Russian developed APTs, we achieved an accuracy rate of 94.6%

arXiv.org e-Print Archive

Crossref