3,673 research outputs found
Drawing Elena Ferrante's Profile. Workshop Proceedings, Padova, 7 September 2017
Elena Ferrante is an internationally acclaimed Italian novelist whose real identity has been kept secret by E/O publishing house for more than 25 years. Owing to her popularity, major Italian and foreign newspapers have long tried to discover her real identity. However, only a few attempts have been made to foster a scientific debate on her work.
In 2016, Arjuna Tuzzi and Michele Cortelazzo led an Italian research team that conducted a preliminary study and collected a well-founded, large corpus of Italian novels comprising 150 works published in the last 30 years by 40 different authors. Moreover, they shared their data with a select group of international experts on authorship attribution, profiling, and analysis of textual data: Maciej Eder and Jan Rybicki (Poland), Patrick Juola (United States), Vittorio Loreto and his research team, Margherita Lalli and Francesca Tria (Italy), George Mikros (Greece), Pierre Ratinaud (France), and Jacques Savoy (Switzerland).
The chapters of this volume report the results of this endeavour that were first presented during the international workshop Drawing Elena Ferrante's Profile in Padua on 7 September 2017 as part of the 3rd IQLA-GIAT Summer School in Quantitative Analysis of Textual Data. The fascinating research findings suggest that Elena Ferrante\u2019s work definitely deserves \u201cmany hands\u201d as well as an extensive effort to understand her distinct writing style and the reasons for her worldwide success
Artificial Sequences and Complexity Measures
In this paper we exploit concepts of information theory to address the
fundamental problem of identifying and defining the most suitable tools to
extract, in a automatic and agnostic way, information from a generic string of
characters. We introduce in particular a class of methods which use in a
crucial way data compression techniques in order to define a measure of
remoteness and distance between pairs of sequences of characters (e.g. texts)
based on their relative information content. We also discuss in detail how
specific features of data compression techniques could be used to introduce the
notion of dictionary of a given sequence and of Artificial Text and we show how
these new tools can be used for information extraction purposes. We point out
the versatility and generality of our method that applies to any kind of
corpora of character strings independently of the type of coding behind them.
We consider as a case study linguistic motivated problems and we present
results for automatic language recognition, authorship attribution and self
consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression
approach to Information Extraction and Classification" by A. Baronchelli and
V. Loreto. 15 pages; 5 figure
Language Trees and Zipping
In this letter we present a very general method to extract information from a
generic string of characters, e.g. a text, a DNA sequence or a time series.
Based on data-compression techniques, its key point is the computation of a
suitable measure of the remoteness of two bodies of knowledge. We present the
implementation of the method to linguistic motivated problems, featuring highly
accurate results for language recognition, authorship attribution and language
classification.Comment: 5 pages, RevTeX4, 1 eps figure. In press in Phys. Rev. Lett. (January
2002
A complex network approach to stylometry
Statistical methods have been widely employed to study the fundamental
properties of language. In recent years, methods from complex and dynamical
systems proved useful to create several language models. Despite the large
amount of studies devoted to represent texts with physical models, only a
limited number of studies have shown how the properties of the underlying
physical systems can be employed to improve the performance of natural language
processing tasks. In this paper, I address this problem by devising complex
networks methods that are able to improve the performance of current
statistical methods. Using a fuzzy classification strategy, I show that the
topological properties extracted from texts complement the traditional textual
description. In several cases, the performance obtained with hybrid approaches
outperformed the results obtained when only traditional or networked methods
were used. Because the proposed model is generic, the framework devised here
could be straightforwardly used to study similar textual applications where the
topology plays a pivotal role in the description of the interacting agents.Comment: PLoS ONE, 2015 (to appear
Relative entropy via non-sequential recursive pair substitutions
The entropy of an ergodic source is the limit of properly rescaled 1-block
entropies of sources obtained applying successive non-sequential recursive
pairs substitutions (see P. Grassberger 2002 ArXiv:physics/0207023 and D.
Benedetto, E. Caglioti and D. Gabrielli 2006 Jour. Stat. Mech. Theo. Exp. 09
doi:10.1088/1742.-5468/2006/09/P09011). In this paper we prove that the cross
entropy and the Kullback-Leibler divergence can be obtained in a similar way.Comment: 13 pages , 2 figure
In search of an evolutionary coding style
In the near future, all the human genes will be identified. But understanding
the functions coded in the genes is a much harder problem. For example, by
using block entropy, one has that the DNA code is closer to a random code then
written text, which in turn is less ordered then an ordinary computer code; see
\cite{schmitt}.
Instead of saying that the DNA is badly written, using our programming
standards, we might say that it is written in a different style -- an
evolutionary style.
We will suggest a way to search for such a style in a quantified manner by
using an artificial life program, and by giving a definition of general codes
and a definition of style for such codes.Comment: 14 pages, 7 postscript figure
- …