28 research outputs found

    A quantitative and typological study of Early Slavic participle clauses and their competition

    Get PDF
    This thesis investigates the semantic and pragmatic properties of Early Slavic participle constructions (conjunct participles and dative absolutes) to understand the principles motivating their selection over one another and over their main finite competitor (jegda-clauses). The issue is tackled by adopting two broadly different approaches, which inform the division of the thesis into two parts. The first part of the thesis uses detailed linguistic annotation on Early Slavic corpora at the morphosyntactic, dependency, information-structural, and lexical levels to obtain indirect evidence for different potential functions of participle clauses and their main finite competitor. The goal of this part of the thesis is to understand the roles of compositionality and default discourse reasoning as explanations for the distribution of participle constructions and jegda-clauses in the Early Slavic corpus. The investigation shows that the competition between conjunct participles, absolute constructions, and jegda-clauses occurs at the level of discourse organization, where the main determining factor in their distribution is the distinction between background and foreground content of an (elementary or complex) discourse unit. The analysis also shows that the major common denominator between the three constructions is that all of them can function as frame-setting devices (i.e. background clauses), albeit to very different extents. In fact, conjunct participles are more typically associated with the foreground constituent of a discourse unit, whereas dative absolutes and jegda-clauses are typically associated with the background content. The second part of the thesis uses massively parallel data, including Old Church Slavonic and Ancient Greek, and analyses typological variation in how languages express the semantic space of English when, whose scope encompasses that of Early Slavic participle constructions and jegda-clauses. To do so, probabilistic semantic maps are generated and statistical methods (including Kriging, Gaussian Mixture Modelling, precision and recall analysis) are used to induce cross-linguistically salient dimensions from the parallel corpus and to study conceptual variation within the semantic space of the hypothetical concept when. Clear typological correspondences and differences with Early Slavic from linguistic phenomena in other languages are then exploited to corroborate and refine observations made on the core semantic-pragmatic properties of participle constructions and jegda-clauses on the basis of annotated Early Slavic data. The analysis shows that 'null’ constructions (juxtaposed clauses such as participles and converbs, or independent clauses) consistently cluster in particular regions of the semantic map cross-linguistically, which clearly indicates that participle clauses are not equally viable as alternatives to any use of when, but carry particular meanings that make them less suitable for some of its functions. The investigation helped identify genealogically and areally unrelated languages that seem typologically very similar to Old Church Slavonic in the way they divide the semantic space of when between overtly subordinated and 'null’ constructions. Comparison with these languages reveals great similarities between the functions of Early Slavic participle constructions and of linguistic phenomena in some of these languages (particularly clause chaining, bridging, insubordination, and switch reference). Crucially, new clear correspondences are found between these phenomena and 'non-canonical’ usages of participle constructions (i.e. coreferential dative absolutes, syntactically independent absolutes and conjunct participles, and participle constructions with no apparent matrix clause), which had often been written off as ‘aberrations’ by previous literature on Early Slavic

    Exploiting Cross-Dialectal Gold Syntax for Low-Resource Historical Languages: Towards a Generic Parser for Pre-Modern Slavic

    Get PDF
    This paper explores the possibility of improving the performance of specialized parsers for pre- modern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cross-dialectal treebank data may be exploited to overcome data scarcity and attempt the training of a variety-agnostic parser. Previous experiments on early Slavic dependency parsing are discussed, particularly with regard to their ability to tackle different orthographic, regional and stylistic features. A generic pre-modern Slavic parser and two specialized parsers – one for East Slavic and one for South Slavic – are trained using jPTDP [8], a neural network model for joint part-of-speech (POS) tagging and dependency parsing which had shown promising results on a number of Universal Dependency (UD) treebanks, including Old Church Slavonic (OCS). With these experiments, a new state of the art is obtained for both OCS (83.79% unlabelled attachment score (UAS) and 78.43% labelled attachment score (LAS)) and Old East Slavic (OES) (85.7% UAS and 80.16% LAS)

    OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data,

    Get PDF
    Historical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a good-quality Early Slavic dependency parser, attained via manipulation of modern Slavic data to resemble the orthography and morphosyntax of pre-modern varieties. The tool can be deployed to expand historical treebanks, which are crucial for data collection and quantification, and beneficial to downstream NLP tasks and historical text mining

    Evaluation of Distributional Semantic Models of Ancient Greek:Preliminary Results and a Road Map for Future Work

    Get PDF
    We evaluate four count-based and predictive distributional semantic models of Ancient Greek against AGREE, a composite benchmark of human judgements, to assess their ability to retrieve semantic relatedness. On the basis of the observations deriving from the analysis of the results, we design a procedure for a largerscale intrinsic evaluation of count-based and predictive language models, including syntactic embeddings. We also propose possible ways of exploiting the different layers of the whole AGREE benchmark (including both humanand machine-generated data) and different evaluation metrics

    One question, different annotation depths: A case study in Early Slavic

    No full text
    This paper addresses some of the challenges of carrying out corpus-based linguistic analyses on historical corpora of different sizes and annotation depths. Data from the TOROT Treebank is collected to carry out a case study on Early Slavic dative absolutes, showing the extent to which methodology and results may change depending on the amount of data and the levels of linguistic annotation available. The analysis indicates that deeply-annotated treebanks of limited size can be exploited to establish a solid guideline to analyze a phenomenon in shallowly-annotated corpora and even new, unannotated texts. This is particularly encouraging for historical languages, such as Early Slavic, showing very high diatopic and diachronic variation, which significantly undermines corpus-annotation automation and therefore calls for alternative strategies to counteract data scarcity
    corecore