Search CORE

182 research outputs found

Can You Fool AI by Doing a 180? \unicode{x2013} A Case Study on Authorship Analysis of Texts by Arata Osada

Author: Masui Fumito
Nieuwazny Jagna
Nowakowski Karol
Ptaszynski Michal
Publication venue: 'Elsevier BV'
Publication date: 19/07/2022
Field of study

This paper is our attempt at answering a twofold question covering the areas of ethics and authorship analysis. Firstly, since the methods used for performing authorship analysis imply that an author can be recognized by the content he or she creates, we were interested in finding out whether it would be possible for an author identification system to correctly attribute works to authors if in the course of years they have undergone a major psychological transition. Secondly, and from the point of view of the evolution of an author's ethical values, we checked what it would mean if the authorship attribution system encounters difficulties in detecting single authorship. We set out to answer those questions through performing a binary authorship analysis task using a text classifier based on a pre-trained transformer model and a baseline method relying on conventional similarity metrics. For the test set, we chose works of Arata Osada, a Japanese educator and specialist in the history of education, with half of them being books written before the World War II and another half in the 1950s, in between which he underwent a transformation in terms of political opinions. As a result, we were able to confirm that in the case of texts authored by Arata Osada in a time span of more than 10 years, while the classification accuracy drops by a large margin and is substantially lower than for texts by other non-fiction writers, confidence scores of the predictions remain at a similar level as in the case of a shorter time span, indicating that the classifier was in many instances tricked into deciding that texts written over a time span of multiple years were actually written by two different people, which in turn leads us to believe that such a change can affect authorship analysis, and that historical events have great impact on a person's ethical outlook as expressed in their writings

arXiv.org e-Print Archive

Review of personal identification systems

Author: Cross J. M.
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/1997
Field of study

The growth of the use of biometric personal identification systems has been relatively steady over the last 20 years. The expected biometric revolution which was forecast since the mid 1970\u27s has not yet occurred. The main factor for lower than expected growth has been the cost and user acceptance of the systems. During the last few years, however, a new generation of more reliable, less expensive and better designed biometric devices have come onto the market. This combined with the anticipated expansion of new reliable, user friendly inexpensive systems provides a signal that the revolution is about to begin. This paper provides a glimpse into the future for personal identification systems and focuses on research directions, emerging applications and significant issues of the future

Research Online @ ECU

Authorship attribution using co-occurrence networks

Author: Pires David Laranjo
Publication venue: 'Universidade de Evora'
Publication date: 02/11/2021
Field of study

Atribuição de Autoria utlizando Redes de Co-Ocorrencia Nesta tese é abordada a tarefa de Atribuição de Autoria como uma tarefa de classificação. As metodologias utilizadas representam textos em grafos. Destes, várias medidas são extraídas, sendo utilizadas como amostras para o classificador. Já existem alguns trabalhos que também se focam nesta metodologia. Esta tese foca-se num método que divide o texto em várias partes e trata cada uma como um grafo. Deste, são extraídas as medidas, que são tratadas como uma série temporal, da qual são extraídos momentos. Assim, os momentos compõem o vetor final, representativo de todo o texto. A partir da metodologia aqui descrita surgem mais duas variações. A primeira variação omite o passo das séries temporais, e, por consequência, as várias medidas de cada grafo são utilizadas diretamente como amostras. A segunda variação representa todo o texto como um só grafo. As metodologias são testadas com corpus em Inglês e Português, com número variado de textos; Abstract: Authorship Attribution using Co-Occurrence Networks This thesis approaches the task of Authorship Attribution as a classification task. This is done using methodologies that represent text documents in graphs, from which several measures are extracted, to be used as samples for the classifier. There have been some works that also focus on this methodology. This thesis focuses on a methodology which splits the texts in multiple parts and treats each as a separate graph, from which measures are extracted. Each graph’s measures are treated as a time-series and moments are extracted. These moments make the final vector, representative of the entire text. This methodology is explored and extended with 2 variations. The first variation skips the time-series step, resulting in the various measures from each graph being used directly as samples. The second variation models the entire text as one graph. The methodologies are tested in corpus in both English and Portuguese, with varying number of texts

Repositório Científico da Universidade de Évora

Making Machines Learn. Applications of Cultural Analytics to the Humanities

Author: de la Rosa Pérez Javier
Publication venue: Scholarship@Western
Publication date: 04/02/2016
Field of study

The digitization of several million books by Google in 2011 meant the popularization of a new kind of humanities research powered by the treatment of cultural objects as data. Culturomics, as it is called, was born, and other initiatives resonated with such a methodological approach, as is the case with the recently formed Digital Humanities or Cultural Analytics. Intrinsically, these new quantitative approaches to culture all borrow from techniques and methods developed under the wing of the exact sciences, such as computer science, machine learning or statistics. There are numerous examples of studies that take advantage of the possibilities that treating objects as data has to offer for the understanding of the human. This new data science that is now applied to the current trends in culture can also be replicated to study more traditional humanities. Led by proper intellectual inquiry, an adequate use of technology may bring answers to questions intractable by other means, or add evidence to long held assumptions based on a canon built from few examples. This dissertation argues in favor of such approach. Three different case studies are considered. First, in the more general sense of the big and smart data, we collected and analyzed more than 120,000 pictures of paintings from all periods of art history, to gain a clear insight on how the beauty of depicted faces, in the framework of neuroscience and evolutionary theory, has changed over time. A second study covers the nuances of modes of emotions employed by the Spanish Golden Age playwright Calderón de la Barca to empathize with his audience. By means of sentiment analysis, a technique strongly supported by machine learning, we shed some light into the different fictional characters, and how they interact and convey messages otherwise invisible to the public. The last case is a study of non-traditional authorship attribution techniques applied to the forefather of the modern novel, the Lazarillo de Tormes. In the end, we conclude that the successful application of cultural analytics and computer science techniques to traditional humanistic endeavours has been enriching and validating

Scholarship@Western

Syllabic quantity patterns as rhythmic features for Latin authorship attribution

Author: Corbara S
Moreo A
Sebastiani F
Publication venue: 'Wiley'
Publication date: 01/01/2022
Field of study

It is well known that, within the Latin production of written text, peculiar metric schemes were followed not only in poetic compositions, but also in many prose works. Such metric patterns were based on so-called syllabic quantity, that is, on the length of the involved syllables, and there is substantial evidence suggesting that certain authors had a preference for certain metric patterns over others. In this research we investigate the possibility to employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. We test the impact of these features on the authorship attribution task when combined with other topic-agnostic features. Our experiments, carried out on three different datasets using support vector machines (SVMs) show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Astronomy and Literature | Canon and Stylometrics

Author: De Gussem Jeroen
Deploige Jeroen
Dockray-Miller Mary
Drout Michael D.C.
Fernández Riva Gustavo
Flood Victoria
Francis Matthew
Garrison Mary
Kedwards Dale
Kestemont Mike
Kinkade Sarah
Leclercq Eveline
Manolova Divna
McLeish Tom
Valerio Jillian
Publication venue: Milano University Press
Publication date: 31/12/2021
Field of study

This eighth issue of Interfaces contains two thematic clusters: the first cluster, entitled The Astronomical Imagination in Literature through the Ages, is edited by Dale Kedwards; the second cluster, entitled Medieval Authorship and Canonicity in the Digital Age, is edited by Jeroen De Gussem and Jeroen Deploige

Riviste UNIMI

Biometric Data Mining Applied to On-line Recognition Systems

Author: Alberto Ochoa
Crispin Zavala
Gennadiy Burlak
José Alberto Hernández-Aguilar
Julio César Ponce
Ocotlán Díaz
Publication venue: 'IntechOpen'
Publication date: 04/04/2011
Field of study

IntechOpen