14 research outputs found

    A Profile-Based Method for Authorship Verification

    Get PDF
    Abstract. Authorship verification is one of the most challenging tasks in stylebased text categorization. Given a set of documents, all by the same author, and another document of unknown authorship the question is whether or not the latter is also by that author. Recently, in the framework of the PAN-2013 evaluation lab, a competition in authorship verification was organized and the vast majority of submitted approaches, including the best performing models, followed the instance-based paradigm where each text sample by one author is treated separately. In this paper, we show that the profile-based paradigm (where all samples by one author are treated cumulatively) can be very effective surpassing the performance of PAN-2013 winners without using any information from external sources. The proposed approach is fully-trainable and we demonstrate an appropriate tuning of parameter settings for PAN-2013 corpora achieving accurate answers especially when the cost of false negatives is high.

    Cross-domain authorship attribution combining instance-based and profile-based features notebook for PAN at CLEF 2019

    Get PDF
    Being able to identify the author of an unknown text is crucial. Although it is a well-studied field, it is still an open problem, since a standard approach has yet to be found. In this notebook, we propose our model for the Authorship Attribution task of PAN 2019, that focuses on cross-domain setting covering 4 different languages: French, Italian, English, and Spanish. We use n-grams of characters, words, stemmed words, and distorted text. Our model has an SVM for each feature and an ensemble architecture. Our final results outperform the baseline given by PAN in almost every problem. With this model, we reach the second place in the task with an F1-score of 68%

    Authenticating the writings of Julius Caesar

    Get PDF
    In this paper, we shed new light on the authenticity of the Corpus Caesarianum, a group of five commentaries describing the campaigns of Julius Caesar (100–44 BC), the founder of the Roman empire. While Caesar himself has authored at least part of these commentaries, the authorship of the rest of the texts remains a puzzle that has persisted for nineteen centuries. In particular, the role of Caesar’s general Aulus Hirtius, who has claimed a role in shaping the corpus, has remained in contention. Determining the authorship of documents is an increasingly important authentication problem in information and computer science, with valuable applications, ranging from the domain of art history to counter-terrorism research. We describe two state-of-the-art authorship verification systems and benchmark them on 6 present-day evaluation corpora, as well as a Latin benchmark dataset. Regarding Caesar’s writings, our analyses allow us to establish that Hirtius’s claims to part of the corpus must be considered legitimate. We thus demonstrate how computational methods constitute a valuable methodological complement to traditional, expert-based approaches to document authentication
    corecore