17 research outputs found

    Using a parallel corpus to study patterns of word order variation: determiners and quantifiers within the noun phrase in European languages

    Get PDF
    Despite the wealth of studies on word order, there have been very few studies on the order of minor word categories such as determiners and quantifiers. This is likely due to the difficulty of formulating valid cross-linguistic definitions for these categories, which also appear problematic from a computational perspective. A solution lies in the formulation of comparative concepts and in their computational implementation by combining different layers of annotation with manually compiled list of lexemes; the proposed methodology is exemplified by a study on the position of these categories with respect to the nominal head, which is conducted on a parallel corpus of 17 European languages and uses Shannon’s entropy to quantify word order variation. Whereas the entropy for the article-noun pattern is, as expected, extremely low, the proposed methodology sheds light on the variation of the demonstrative-noun and the quantifier-noun patterns in three languages of the sample

    The Evolutionary Dynamics of Negative Existentials in Indo-European

    Get PDF
    Where in earlier work diachronic change is used to explain away exceptions to typologies, linguistic typologists have started to make use of explicit diachronic models as explanations for typological distributions. A topic that lends itself for this approach especially well is that of negation. In this article, we assess the explanatory value of a specific hypothesis, the Negative Existential Cycle (NEC), on the distribution of negative existential strategies (“types”) in 106 Indo-European languages. We use Bayesian phylogenetic comparative methods to infer posterior distributions of transition rates and parameters, thus applying rational methods to construct and evaluate a set of different models under which the attested typological distribution could have evolved. We find that the frequency of diachronic processes that affect negative existentials outside of the NEC cannot be ignored—the unidirectional NEC alone cannot explain the evolution of negative existential strategies in our sample. We show that non-unidirectional evolutionary models, especially those that allow for different and multiple transitions between strategies, provide better fit. In addition, the phylogenetic modeling is impacted by the expected skewed distribution of negative existential strategies in our sample, pointing out the need for densely sampled and family-based typological research

    Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German

    Get PDF
    We present two comparable diachronic corpora of scientific English and German from the Late Modern Period (17th c.--19th c.) annotated with Universal Dependencies. We describe several steps of data pre-processing and evaluate the resulting parsing accuracy showing how our pre-processing steps significantly improve output quality. As a sanity check for the representativity of our data, we conduct a case study comparing previously gained insights on grammatical change in the scientific genre with our data. Our results reflect the often reported trend of English scientific discourse towards heavy noun phrases and a simplification of the sentence structure (Halliday, 1988; Halliday and Martin, 1993; Biber and Gray, 2011; Biber and Gray, 2016). We also show that this trend applies to German scientific discourse as well. The presented corpora are valuable resources suitable for the contrastive analysis of syntactic diachronic change in the scientific genre between 1650 and 1900. The presented pre-processing procedures and their evaluations are applicable to other languages and can be useful for a variety of Natural Language Processing tasks such as syntactic parsing.This work is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 232722074 – SFB 1102

    Nominalizations of property concepts: Evidence from Italian

    No full text
    This research aims to shed light on the poorly investigated issue of deadjectival nominalization, focusing on Italian and taking a corpus-based approach. The adopted framework is grounded in Croft’s Radical Construction Grammar, which considers lexical categories as constructions of pragmatic functions and semantic concepts; in Croft’s view, de-adjectival nouns are then references of property concepts, which however gives only a generic understanding of this type of construction. Radical Construction Grammar is then integrated with Malchukov’s perspective on nominalization, which describes the phenomenon as further articulated into the two parallel processes of the acquirement of nominal parameters and the loss of verbal parameters. The application of the elaborated framework to thousands of occurrences of Italian de-adjectival nouns sees the emergence of a series of grammatical patterns describing a number of minor lexical categories, which are found between the two major categories of Italian adjectives and nouns

    EModSar: A Corpus of Early Modern Sardinian Texts

    No full text
    The article introduces the Early Modern Sardinian Corpus (EModSar), a corpus featuring nine manuscripts from the Early Modern Period (16th-17th centuries) written in Sardinian with passages in Catalan and Latin. Manuscripts are encoded according to the TEI-P5 guidelines, annotated for bibliographic, philological and linguistic features and published on-line using TEITOK, a software aimed at combining digital philology and corpus linguistics

    DerIvaTario: An annotated lexicon of Italian derivatives

    No full text
    We propose an annotation schema for derivational morphology featuring morphological, morphotactic and morphosemantic information concerning the base of the derivative as well as each derivational cycle. This schema was employed in the manual annotation of about 11,000 Italian derivatives, extracted from the CoLFIS corpus. The outcome is DerIvaTario, an annotated lexicon of Italian derivatives. The inter-annotator agreement was assessed over several variables of the annotation schema. DerIvaTario is available as an interactive database to be used for theoretical morphology and psycholinguistic research, and as a resource for automatic tagging of large Italian corpora

    Exploring linguistic representations of identity through the DiSCIS corpus: evidence from Directive acts in Plautus and Goldoni

    No full text
    This paper illustrates the theoretical background and structure of a sociopragmatically annotated corpus based on Plautus’ and Goldoni’s comedies, named DiSCIS (Diachronic Socio-pragmatic Corpus of Imaginary Speech). After presenting the corpus , we illustrate how it can be used in order to explore linguistic strategies representing identity. More precisely, we focus on a specific type of Speech Act, i.e., Directives, which constitute a fruitful laboratory to explore the dynamics of identity expression and negotiation. Directives are by definition potentially impolite acts that threaten the interlocutor’s negative face and tend to be modulated under certain circumstances through pragmatic strategies. These strategies are analyzed through a pragmatic and historical-variationist case study that compares the use of Directives in two languages, namely Latin and Italian, in two different historical periods, across different social classes of speakers differing by gender, age and social rank
    corecore