182 research outputs found

    Can You Fool AI by Doing a 180? \unicode{x2013} A Case Study on Authorship Analysis of Texts by Arata Osada

    Full text link
    This paper is our attempt at answering a twofold question covering the areas of ethics and authorship analysis. Firstly, since the methods used for performing authorship analysis imply that an author can be recognized by the content he or she creates, we were interested in finding out whether it would be possible for an author identification system to correctly attribute works to authors if in the course of years they have undergone a major psychological transition. Secondly, and from the point of view of the evolution of an author's ethical values, we checked what it would mean if the authorship attribution system encounters difficulties in detecting single authorship. We set out to answer those questions through performing a binary authorship analysis task using a text classifier based on a pre-trained transformer model and a baseline method relying on conventional similarity metrics. For the test set, we chose works of Arata Osada, a Japanese educator and specialist in the history of education, with half of them being books written before the World War II and another half in the 1950s, in between which he underwent a transformation in terms of political opinions. As a result, we were able to confirm that in the case of texts authored by Arata Osada in a time span of more than 10 years, while the classification accuracy drops by a large margin and is substantially lower than for texts by other non-fiction writers, confidence scores of the predictions remain at a similar level as in the case of a shorter time span, indicating that the classifier was in many instances tricked into deciding that texts written over a time span of multiple years were actually written by two different people, which in turn leads us to believe that such a change can affect authorship analysis, and that historical events have great impact on a person's ethical outlook as expressed in their writings

    Review of personal identification systems

    Get PDF
    The growth of the use of biometric personal identification systems has been relatively steady over the last 20 years. The expected biometric revolution which was forecast since the mid 1970\u27s has not yet occurred. The main factor for lower than expected growth has been the cost and user acceptance of the systems. During the last few years, however, a new generation of more reliable, less expensive and better designed biometric devices have come onto the market. This combined with the anticipated expansion of new reliable, user friendly inexpensive systems provides a signal that the revolution is about to begin. This paper provides a glimpse into the future for personal identification systems and focuses on research directions, emerging applications and significant issues of the future

    Authorship attribution using co-occurrence networks

    Get PDF
    Atribuição de Autoria utlizando Redes de Co-Ocorrencia Nesta tese é abordada a tarefa de Atribuição de Autoria como uma tarefa de classificação. As metodologias utilizadas representam textos em grafos. Destes, várias medidas são extraídas, sendo utilizadas como amostras para o classificador. Já existem alguns trabalhos que também se focam nesta metodologia. Esta tese foca-se num método que divide o texto em várias partes e trata cada uma como um grafo. Deste, são extraídas as medidas, que são tratadas como uma série temporal, da qual são extraídos momentos. Assim, os momentos compõem o vetor final, representativo de todo o texto. A partir da metodologia aqui descrita surgem mais duas variações. A primeira variação omite o passo das séries temporais, e, por consequência, as várias medidas de cada grafo são utilizadas diretamente como amostras. A segunda variação representa todo o texto como um só grafo. As metodologias são testadas com corpus em Inglês e Português, com número variado de textos; Abstract: Authorship Attribution using Co-Occurrence Networks This thesis approaches the task of Authorship Attribution as a classification task. This is done using methodologies that represent text documents in graphs, from which several measures are extracted, to be used as samples for the classifier. There have been some works that also focus on this methodology. This thesis focuses on a methodology which splits the texts in multiple parts and treats each as a separate graph, from which measures are extracted. Each graph’s measures are treated as a time-series and moments are extracted. These moments make the final vector, representative of the entire text. This methodology is explored and extended with 2 variations. The first variation skips the time-series step, resulting in the various measures from each graph being used directly as samples. The second variation models the entire text as one graph. The methodologies are tested in corpus in both English and Portuguese, with varying number of texts

    Making Machines Learn. Applications of Cultural Analytics to the Humanities

    Get PDF
    The digitization of several million books by Google in 2011 meant the popularization of a new kind of humanities research powered by the treatment of cultural objects as data. Culturomics, as it is called, was born, and other initiatives resonated with such a methodological approach, as is the case with the recently formed Digital Humanities or Cultural Analytics. Intrinsically, these new quantitative approaches to culture all borrow from techniques and methods developed under the wing of the exact sciences, such as computer science, machine learning or statistics. There are numerous examples of studies that take advantage of the possibilities that treating objects as data has to offer for the understanding of the human. This new data science that is now applied to the current trends in culture can also be replicated to study more traditional humanities. Led by proper intellectual inquiry, an adequate use of technology may bring answers to questions intractable by other means, or add evidence to long held assumptions based on a canon built from few examples. This dissertation argues in favor of such approach. Three different case studies are considered. First, in the more general sense of the big and smart data, we collected and analyzed more than 120,000 pictures of paintings from all periods of art history, to gain a clear insight on how the beauty of depicted faces, in the framework of neuroscience and evolutionary theory, has changed over time. A second study covers the nuances of modes of emotions employed by the Spanish Golden Age playwright Calderón de la Barca to empathize with his audience. By means of sentiment analysis, a technique strongly supported by machine learning, we shed some light into the different fictional characters, and how they interact and convey messages otherwise invisible to the public. The last case is a study of non-traditional authorship attribution techniques applied to the forefather of the modern novel, the Lazarillo de Tormes. In the end, we conclude that the successful application of cultural analytics and computer science techniques to traditional humanistic endeavours has been enriching and validating

    Syllabic quantity patterns as rhythmic features for Latin authorship attribution

    Get PDF
    It is well known that, within the Latin production of written text, peculiar metric schemes were followed not only in poetic compositions, but also in many prose works. Such metric patterns were based on so-called syllabic quantity, that is, on the length of the involved syllables, and there is substantial evidence suggesting that certain authors had a preference for certain metric patterns over others. In this research we investigate the possibility to employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. We test the impact of these features on the authorship attribution task when combined with other topic-agnostic features. Our experiments, carried out on three different datasets using support vector machines (SVMs) show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors

    Astronomy and Literature | Canon and Stylometrics

    Get PDF
    This eighth issue of Interfaces contains two thematic clusters: the first cluster, entitled The Astronomical Imagination in Literature through the Ages, is edited by Dale Kedwards; the second cluster, entitled Medieval Authorship and Canonicity in the Digital Age, is edited by Jeroen De Gussem and Jeroen Deploige
    corecore