15 research outputs found

    Finding Characteristic Features in Stylometric Analysis

    Get PDF
    The usual focus in authorship studies is on authorship attribution, i.e. determining which author (of a given set) wrote a piece of unknown provenance. The usual setting involves a small number of candidate authors, which means that the focus quickly revolves around a search for features that discriminate among the candidates. Whether the features that serve to discriminate among the authors are characteristic is then not of primary importance. We respectfully suggest an alternative in this article, namely a focus on seeking features that are characteristic for an author with respect to others. To determine an author's characteristic features, we first seek elements that he or she uses consistently, which we therefore regard as 'representative', but we likewise seek elements which the author uses 'distinctively' in comparison to an opposing author. We test the idea on a task recently proposed that compares Charles Dickens to both Wilkie Collins and a larger reference set comprising several authors' works from the 18th and 19th century. We then compare the use of representative and distinctive features to Burrows' 'Delta' and Hoovers' 'CoV Tuning'; we find that our method bears little similarity with either method in terms of characteristic feature selection. We show that our method achieves reliable and consistent results in the twoauthor comparison and fair results in the multi-author one, measured by separation ability in clustering.</p

    CoNLL-UL: Universal Morphological Lattices for Universal Dependency Parsing

    Get PDF
    International audienceFollowing the development of the universal dependencies (UD) framework and the CoNLL 2017 Shared Task on end-to-end UD parsing, we address the need for a universal representation of morphological analysis which on the one hand can capture a range of different alternative morphological analyses of surface tokens, and on the other hand is compatible with the segmentation and morphological annotation guidelines prescribed for UD treebanks. We propose the CoNLL universal lattices (CoNLL-UL) format, a new annotation format for word lattices that represent morphological analyses, and provide resources that obey this format for a range of typologically different languages. The resources we provide are harmonized with the two-level representation and morphological annotation in their respective UD v2 treebanks, thus enabling research on universal models for morphological and syntactic parsing , in both pipeline and joint settings, and presenting new opportunities in the development of UD resources for low-resource languages

    A grammar-book treebank of Turkish

    No full text
    corecore