17 research outputs found
Using a parallel corpus to study patterns of word order variation: determiners and quantifiers within the noun phrase in European languages
Despite the wealth of studies on word order, there have been very few studies on the order of minor word categories such as determiners and quantifiers. This is likely due to the difficulty of formulating valid cross-linguistic definitions for these categories, which also appear problematic from a computational perspective. A solution lies in the formulation of comparative concepts and in their computational implementation by combining different layers of annotation with manually compiled list of lexemes; the proposed methodology is exemplified by a study on the position of these categories with respect to the nominal head, which is conducted on a parallel corpus of 17 European languages and uses Shannon’s entropy to quantify word order variation. Whereas the entropy for the article-noun pattern is, as expected, extremely low, the proposed methodology sheds light on the variation of the demonstrative-noun and the quantifier-noun patterns in three languages of the sample
The Evolutionary Dynamics of Negative Existentials in Indo-European
Where in earlier work diachronic change is used to explain away exceptions to
typologies, linguistic typologists have started to make use of explicit diachronic
models as explanations for typological distributions. A topic that lends itself for this
approach especially well is that of negation. In this article, we assess the explanatory
value of a specific hypothesis, the Negative Existential Cycle (NEC), on the distribution
of negative existential strategies (“types”) in 106 Indo-European languages. We use
Bayesian phylogenetic comparative methods to infer posterior distributions of
transition rates and parameters, thus applying rational methods to construct and
evaluate a set of different models under which the attested typological distribution
could have evolved. We find that the frequency of diachronic processes that affect
negative existentials outside of the NEC cannot be ignored—the unidirectional NEC
alone cannot explain the evolution of negative existential strategies in our sample. We
show that non-unidirectional evolutionary models, especially those that allow for
different and multiple transitions between strategies, provide better fit. In addition,
the phylogenetic modeling is impacted by the expected skewed distribution of negative
existential strategies in our sample, pointing out the need for densely sampled and
family-based typological research
Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German
We present two comparable diachronic corpora of scientific English and German from the Late Modern Period (17th c.--19th c.) annotated with Universal Dependencies. We describe several steps of data pre-processing and evaluate the resulting parsing accuracy showing how our pre-processing steps significantly improve output quality. As a sanity check for the representativity of our data, we conduct a case study comparing previously gained insights on grammatical change in the scientific genre with our data. Our results reflect the often reported trend of English scientific discourse towards heavy noun phrases and a simplification of the sentence structure (Halliday, 1988; Halliday and Martin, 1993; Biber and Gray, 2011; Biber and Gray, 2016). We also show that this trend applies to German scientific discourse as well. The presented corpora are valuable resources suitable for the contrastive analysis of syntactic diachronic change in the scientific genre between 1650 and 1900. The presented pre-processing procedures and their evaluations are applicable to other languages and can be useful for a variety of Natural Language Processing tasks such as syntactic parsing.This work is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 232722074 – SFB 1102
Nominalizations of property concepts: Evidence from Italian
This research aims to shed light on the poorly investigated issue of deadjectival
nominalization, focusing on Italian and taking a corpus-based
approach. The adopted framework is grounded in Croft’s Radical
Construction Grammar, which considers lexical categories as constructions
of pragmatic functions and semantic concepts; in Croft’s view, de-adjectival
nouns are then references of property concepts, which however gives only a
generic understanding of this type of construction. Radical Construction
Grammar is then integrated with Malchukov’s perspective on
nominalization, which describes the phenomenon as further articulated into
the two parallel processes of the acquirement of nominal parameters and the
loss of verbal parameters. The application of the elaborated framework to
thousands of occurrences of Italian de-adjectival nouns sees the emergence
of a series of grammatical patterns describing a number of minor lexical
categories, which are found between the two major categories of Italian
adjectives and nouns
EModSar: A Corpus of Early Modern Sardinian Texts
The article introduces the Early Modern Sardinian Corpus (EModSar), a corpus featuring nine manuscripts from the Early Modern Period (16th-17th centuries) written in Sardinian with passages in Catalan and Latin. Manuscripts are encoded according to the TEI-P5 guidelines, annotated for bibliographic, philological and linguistic features and published on-line using TEITOK, a software aimed at combining digital philology and corpus linguistics
DerIvaTario: An annotated lexicon of Italian derivatives
We propose an annotation schema for derivational morphology featuring morphological, morphotactic and morphosemantic information concerning the base of the derivative as well as each derivational cycle. This schema was employed in the manual annotation of about 11,000 Italian derivatives, extracted from the CoLFIS corpus. The outcome is DerIvaTario, an annotated lexicon of Italian derivatives. The inter-annotator agreement was assessed over several variables of the annotation schema. DerIvaTario is available as an interactive database to be used for theoretical morphology and psycholinguistic research, and as a resource for automatic tagging of large Italian corpora
Exploring linguistic representations of identity through the DiSCIS corpus: evidence from Directive acts in Plautus and Goldoni
This paper illustrates the theoretical background and structure of a sociopragmatically annotated corpus based on Plautus’ and Goldoni’s comedies, named DiSCIS (Diachronic Socio-pragmatic Corpus of Imaginary Speech). After presenting the corpus , we illustrate how it can be used in order to explore linguistic strategies representing identity. More precisely, we focus on a specific type of Speech Act, i.e., Directives, which constitute a fruitful laboratory to explore the dynamics of identity expression and negotiation. Directives are by definition potentially impolite acts that threaten the interlocutor’s negative face and tend to be modulated under certain circumstances through pragmatic strategies. These strategies are analyzed through a pragmatic and historical-variationist case study that compares the use of Directives in two languages, namely Latin and Italian, in two different historical periods, across different social classes of speakers differing by gender, age and social rank