182 research outputs found
Can You Fool AI by Doing a 180? \unicode{x2013} A Case Study on Authorship Analysis of Texts by Arata Osada
This paper is our attempt at answering a twofold question covering the areas
of ethics and authorship analysis. Firstly, since the methods used for
performing authorship analysis imply that an author can be recognized by the
content he or she creates, we were interested in finding out whether it would
be possible for an author identification system to correctly attribute works to
authors if in the course of years they have undergone a major psychological
transition. Secondly, and from the point of view of the evolution of an
author's ethical values, we checked what it would mean if the authorship
attribution system encounters difficulties in detecting single authorship. We
set out to answer those questions through performing a binary authorship
analysis task using a text classifier based on a pre-trained transformer model
and a baseline method relying on conventional similarity metrics. For the test
set, we chose works of Arata Osada, a Japanese educator and specialist in the
history of education, with half of them being books written before the World
War II and another half in the 1950s, in between which he underwent a
transformation in terms of political opinions. As a result, we were able to
confirm that in the case of texts authored by Arata Osada in a time span of
more than 10 years, while the classification accuracy drops by a large margin
and is substantially lower than for texts by other non-fiction writers,
confidence scores of the predictions remain at a similar level as in the case
of a shorter time span, indicating that the classifier was in many instances
tricked into deciding that texts written over a time span of multiple years
were actually written by two different people, which in turn leads us to
believe that such a change can affect authorship analysis, and that historical
events have great impact on a person's ethical outlook as expressed in their
writings
Review of personal identification systems
The growth of the use of biometric personal identification systems has been relatively steady over the last 20 years. The expected biometric revolution which was forecast since the mid 1970\u27s has not yet occurred. The main factor for lower than expected growth has been the cost and user acceptance of the systems. During the last few years, however, a new generation of more reliable, less expensive and better designed biometric devices have come onto the market. This combined with the anticipated expansion of new reliable, user friendly inexpensive systems provides a signal that the revolution is about to begin. This paper provides a glimpse into the future for personal identification systems and focuses on research directions, emerging applications and significant issues of the future
Authorship attribution using co-occurrence networks
Atribuição de Autoria utlizando Redes de
Co-Ocorrencia
Nesta tese é abordada a tarefa de Atribuição de Autoria como uma tarefa de classificação. As metodologias
utilizadas representam textos em grafos. Destes, várias medidas são extraídas, sendo utilizadas como
amostras para o classificador. Já existem alguns trabalhos que também se focam nesta metodologia. Esta
tese foca-se num método que divide o texto em várias partes e trata cada uma como um grafo. Deste, são
extraídas as medidas, que são tratadas como uma série temporal, da qual são extraídos momentos. Assim,
os momentos compõem o vetor final, representativo de todo o texto. A partir da metodologia aqui descrita
surgem mais duas variações. A primeira variação omite o passo das séries temporais, e, por consequência,
as várias medidas de cada grafo são utilizadas diretamente como amostras. A segunda variação representa
todo o texto como um só grafo. As metodologias são testadas com corpus em Inglês e Português, com
número variado de textos; Abstract:
Authorship Attribution using Co-Occurrence
Networks
This thesis approaches the task of Authorship Attribution as a classification task. This is done using
methodologies that represent text documents in graphs, from which several measures are extracted, to be
used as samples for the classifier. There have been some works that also focus on this methodology. This
thesis focuses on a methodology which splits the texts in multiple parts and treats each as a separate graph,
from which measures are extracted. Each graph’s measures are treated as a time-series and moments are
extracted. These moments make the final vector, representative of the entire text. This methodology is
explored and extended with 2 variations. The first variation skips the time-series step, resulting in the
various measures from each graph being used directly as samples. The second variation models the entire
text as one graph. The methodologies are tested in corpus in both English and Portuguese, with varying
number of texts
Making Machines Learn. Applications of Cultural Analytics to the Humanities
The digitization of several million books by Google in 2011 meant the popularization of a new kind of humanities research powered by the treatment of cultural objects as data. Culturomics, as it is called, was born, and other initiatives resonated with such a methodological approach, as is the case with the recently formed Digital Humanities or Cultural Analytics. Intrinsically, these new quantitative approaches to culture all borrow from techniques and methods developed under the wing of the exact sciences, such as computer science, machine learning or statistics. There are numerous examples of studies that take advantage of the possibilities that treating objects as data has to offer for the understanding of the human. This new data science that is now applied to the current trends in culture can also be replicated to study more traditional humanities. Led by proper intellectual inquiry, an adequate use of technology may bring answers to questions intractable by other means, or add evidence to long held assumptions based on a canon built from few examples. This dissertation argues in favor of such approach. Three different case studies are considered. First, in the more general sense of the big and smart data, we collected and analyzed more than 120,000 pictures of paintings from all periods of art history, to gain a clear insight on how the beauty of depicted faces, in the framework of neuroscience and evolutionary theory, has changed over time. A second study covers the nuances of modes of emotions employed by the Spanish Golden Age playwright Calderón de la Barca to empathize with his audience. By means of sentiment analysis, a technique strongly supported by machine learning, we shed some light into the different fictional characters, and how they interact and convey messages otherwise invisible to the public. The last case is a study of non-traditional authorship attribution techniques applied to the forefather of the modern novel, the Lazarillo de Tormes. In the end, we conclude that the successful application of cultural analytics and computer science techniques to traditional humanistic endeavours has been enriching and validating
Syllabic quantity patterns as rhythmic features for Latin authorship attribution
It is well known that, within the Latin production of written text, peculiar metric schemes were followed not only in poetic compositions, but also in many prose works. Such metric patterns were based on so-called syllabic quantity, that is, on the length of the involved syllables, and there is substantial evidence suggesting that certain authors had a preference for certain metric patterns over others. In this research we investigate the possibility to employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. We test the impact of these features on the authorship attribution task when combined with other topic-agnostic features. Our experiments, carried out on three different datasets using support vector machines (SVMs) show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors
Astronomy and Literature | Canon and Stylometrics
This eighth issue of Interfaces contains two thematic clusters: the first cluster, entitled The Astronomical Imagination in Literature through the Ages, is edited by Dale Kedwards; the second cluster, entitled Medieval Authorship and Canonicity in the Digital Age, is edited by Jeroen De Gussem and Jeroen Deploige
- …