271 research outputs found

    English Bards and Unknown Reviewers: a Stylometric Analysis of Thomas Moore and the Christabel Review

    Get PDF
    Fraught relations between authors and critics are a commonplace of literary history. The particular case that we discuss in this article, a negative review of Samuel Taylor Coleridge's Christabel (1816), has an additional point of interest beyond the usual mixture of amusement and resentment that surrounds a critical rebuke: the authorship of the review remains, to this day, uncertain. The purpose of this article is to investigate the possible candidacy of Thomas Moore as the author of the provocative review. It seeks to solve a puzzle of almost two hundred years, and in the process clear a valuable scholarly path in Irish Studies, Romanticism, and in our understanding of Moore's role in a prominent literary controversy of the age

    CAG : stylometric authorship attribution of multi-author documents using a co-authorship graph

    Get PDF
    © 2020 The Authors. Published by IEEE. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://ieeexplore.ieee.org/document/8962080Stylometry has been successfully applied to perform authorship identification of single-author documents (AISD). The AISD task is concerned with identifying the original author of an anonymous document from a group of candidate authors. However, AISD techniques are not applicable to the authorship identification of multi-author documents (AIMD). Unlike AISD, where each document is written by one single author, AIMD focuses on handling multi-author documents. Due to the combinatoric nature of documents, AIMD lacks the ground truth information - that is, information on writing and non-writing authors in a multi-author document - which makes this problem more challenging to solve. Previous AIMD solutions have a number of limitations: (i) the best stylometry-based AIMD solution has a low accuracy, less than 30%; (ii) increasing the number of co-authors of papers adversely affects the performance of AIMD solutions; and (iii) AIMD solutions were not designed to handle the non-writing authors (NWAs). However, NWAs exist in real-world cases - that is, there are papers for which not every co-author listed has contributed as a writer. This paper proposes an AIMD framework called the Co-Authorship Graph that can be used to (i) capture the stylistic information of each author in a corpus of multi-author documents and (ii) make a multi-label prediction for a multi-author query document. We conducted extensive experimental studies on one synthetic and three real-world corpora. Experimental results show that our proposed framework (i) significantly outperformed competitive techniques; (ii) can effectively handle a larger number of co-authors in comparison with competitive techniques; and (iii) can effectively handle NWAs in multi-author documents.This work was supported in part by the Digital Economy Promotion Agency under Project MP-62-0003, and in part by the Thailand Research Fund and Office of the Higher Education Commission under Grant MRG6180266.Published versio

    Drawing Elena Ferrante's Profile. Workshop Proceedings, Padova, 7 September 2017

    Get PDF
    Elena Ferrante is an internationally acclaimed Italian novelist whose real identity has been kept secret by E/O publishing house for more than 25 years. Owing to her popularity, major Italian and foreign newspapers have long tried to discover her real identity. However, only a few attempts have been made to foster a scientific debate on her work. In 2016, Arjuna Tuzzi and Michele Cortelazzo led an Italian research team that conducted a preliminary study and collected a well-founded, large corpus of Italian novels comprising 150 works published in the last 30 years by 40 different authors. Moreover, they shared their data with a select group of international experts on authorship attribution, profiling, and analysis of textual data: Maciej Eder and Jan Rybicki (Poland), Patrick Juola (United States), Vittorio Loreto and his research team, Margherita Lalli and Francesca Tria (Italy), George Mikros (Greece), Pierre Ratinaud (France), and Jacques Savoy (Switzerland). The chapters of this volume report the results of this endeavour that were first presented during the international workshop Drawing Elena Ferrante's Profile in Padua on 7 September 2017 as part of the 3rd IQLA-GIAT Summer School in Quantitative Analysis of Textual Data. The fascinating research findings suggest that Elena Ferrante\u2019s work definitely deserves \u201cmany hands\u201d as well as an extensive effort to understand her distinct writing style and the reasons for her worldwide success

    A scalable framework for stylometric analysis of multi-author documents

    Get PDF
    This is an accepted manuscript of a chapter published by Springer in Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science, vol 10827 on 13/05/2018, available online: https://doi.org/10.1007/978-3-319-91452-7_52 The accepted version of the publication may differ from the final published version.Stylometry is a statistical technique used to analyze the variations in the author’s writing styles and is typically applied to authorship attribution problems. In this investigation, we apply stylometry to authorship identification of multi-author documents (AIMD) task. We propose an AIMD technique called Co-Authorship Graph (CAG) which can be used to collaboratively attribute different portions of documents to different authors belonging to the same community. Based on CAG, we propose a novel AIMD solution which (i) significantly outperforms the existing state-of-the-art solution; (ii) can effectively handle a larger number of co-authors; and (iii) is capable of handling the case when some of the listed co-authors have not contributed to the document as a writer. We conducted an extensive experimental study to compare the proposed solution and the best existing AIMD method using real and synthetic datasets. We show that the proposed solution significantly outperforms existing state-of-the-art method

    Larger than Life? A Stylometric Analysis of the Multi-Authored 'Vita' of Hildegard of Bingen

    Get PDF
    This article explores by aid of stylometric methods the collaborative authorship of the Vita Hildegardis, Hildegard of Bingen's (auto-?)biography. Both Hildegard and her biographers gradually contributed to the text in the course of the last years of Hildegard's life, and it was posthumously completed in the mid-1180s by end redactor Theoderic of Echternach. In between these termini a quo and ante quem the work was allegedly taken up but left unfinished by secretaries Godfrey of Disibodenberg and Guibert of Gembloux. In light of the fact that the Vita is an indispensable source in gaining historical knowledge on Hildegard's life, the question has often been raised whether the Life of Hildegard is – by dint of contributions by multiple stakeholders – a larger-than-life depiction of the visionary's life course. Specifically the 'autobiographical' passages included in the Vita, in which Hildegard is allegedly cited directly and is taken to recount biographical information in the first-person singular, have been approached with suspicion. By applying state-of-the-art computional methods for the automatic detection of writing style (stylometry), the delicate questions of authenticity and collaborative authorship of this (auto?)hagiographical text are addressed

    SMAuC -- The Scientific Multi-Authorship Corpus

    Full text link
    With an ever-growing number of new publications each day, scientific writing poses an interesting domain for authorship analysis of both single-author and multi-author documents. Unfortunately, most existing corpora lack either material from the science domain or the required metadata. Hence, we present SMAuC, a new metadata-rich corpus designed specifically for authorship analysis in scientific writing. With more than three million publications from various scientific disciplines, SMAuC is the largest openly available corpus for authorship analysis to date. It combines a wide and diverse range of scientific texts from the humanities and natural sciences with rich and curated metadata, including unique and carefully disambiguated author IDs. We hope SMAuC will contribute significantly to advancing the field of authorship analysis in the science domain

    Stylometry in a bilingual setup

    Get PDF
    The method of stylometry by most frequent words does not allow direct comparison of original texts and their translations, i.e. across languages. For instance, in a bilingual Czech-German text collection containing parallel texts (originals and translations in both directions, along with Czech and German translations from other languages), authors would not cluster across languages, since frequency word lists for any Czech texts are obviously going to be more similar to each other than to a German text, and the other way round. We have tried to come up with an interlingua that would remove the language-specific features and possibly keep the linguistically independent features of individual author signal, if they exist. We have tagged, lemmatized, and parsed each language counterpart with the corresponding language model in UDPipe, which provides a linguistic markup that is cross-lingual to a significant extent. We stripped the output of language-dependent items, but that alone did not help much. As a next step, we transformed the lemmas of both language counterparts into shared pseudolemmas based on a very crude Czech-German glossary, with a 95.6% success. We show that, for stylometric methods based on the most frequent words, we can do without translations

    Mining and Analyzing the Academic Network

    Get PDF
    Social Network research has attracted the interests of many researchers, not only in analyzing the online social networking applications, such as Facebook and Twitter, but also in providing comprehensive services in scientific research domain. We define an Academic Network as a social network which integrates scientific factors, such as authors, papers, affiliations, publishing venues, and their relationships, such as co-authorship among authors and citations among papers. By mining and analyzing the academic network, we can provide users comprehensive services as searching for research experts, published papers, conferences, as well as detecting research communities or the evolutions hot research topics. We can also provide recommendations to users on with whom to collaborate, whom to cite and where to submit.In this dissertation, we investigate two main tasks that have fundamental applications in the academic network research. In the first, we address the problem of expertise retrieval, also known as expert finding or ranking, in which we identify and return a ranked list of researchers, based upon their estimated expertise or reputation, to user-specified queries. In the second, we address the problem of research action recommendation (prediction), specifically, the tasks of publishing venue recommendation, citation recommendation and coauthor recommendation. For both tasks, to effectively mine and integrate heterogeneous information and therefore develop well-functioning ranking or recommender systems is our principal goal. For the task of expertise retrieval, we first proposed or applied three modified versions of PageRank-like algorithms into citation network analysis; we then proposed an enhanced author-topic model by simultaneously modeling citation and publishing venue information; we finally incorporated the pair-wise learning-to-rank algorithm into traditional topic modeling process, and further improved the model by integrating groups of author-specific features. For the task of research action recommendation, we first proposed an improved neighborhood-based collaborative filtering approach for publishing venue recommendation; we then applied our proposed enhanced author-topic model and demonstrated its effectiveness in both cited author prediction and publishing venue prediction; finally we proposed an extended latent factor model that can jointly model several relations in an academic environment in a unified way and verified its performance in four recommendation tasks: the recommendation on author-co-authorship, author-paper citation, paper-paper citation and paper-venue submission. Extensive experiments conducted on large-scale real-world data sets demonstrated the superiority of our proposed models over other existing state-of-the-art methods
    corecore