1 research outputs found

    Baseline Approaches for the Authorship Identification Task Notebook for PAN at CLEF 2011

    No full text
    Abstract In this paper we present the evaluation of three different classifiers (Rocchio, Na茂ve Bayes and Greedy) with the aim of obtaining a baseline in the task of authorship identification. We decided to employ as features the original words contained in each document of the test set, with a minimum of preprocessing which included elimination of stopwords, punctuation symbols and XML tags. As may be seen in this paper, the obtained results are adequate, reflecting the aim of the experiments. In average, Rocchio slightly outperformed the Na茂ve Bayes and the Greedy classifier. However, we recommend using both, Rocchio and Na茂ve Bayes in future evaluations of the PAN competition as baselines from which other teams may compare their own approach.
    corecore