27 research outputs found

    A story of the American -self: A case study in morphological variation

    Get PDF

    Antisymmetry and the Conservation of C-Command: Scrambling and Phrase Structure in Synchronic and Diachronic Perspective

    Get PDF
    Holmberg’s Generalization (Holmberg 1986) was originally stated to describe the “object shift” phenomena found in the modern Scandinavian languages. This dissertation argues that object shift is merely a subcase of scrambling, a type of adjunction, and that Holmberg’s Generalization is a subcase of a universal constraint, the “Generalized Holmberg Constraint” (GHC), which prohibits leftward scrambling across c-commanding functional heads. The existence of such a constraint turns out to have ramifications far beyond the analysis of scrambling itself, and the predictions it makes ultimately form an extended argument in favor of a universal antisymmetric approach to phrase structure (Kayne 1994). The most important evidence for the GHC comes from diachronic data. The study presents quantitative data from the history of Yiddish and English to show that, in cases where a language undergoes major changes in its clause structure, the GHC remains an active and stable constraint in the language, indicating its status as a universal. Once a phrase structure change begins, the resulting variation within a single speech community, and even within individuals, immediately shows the effect of the GHC on scrambling. The latter portion of the study argues that the GHC is not merely a constraint on scrambling, but rather a much more general constraint on the way syntactic computations progress, the “Conservation of C-Command.” The Conservation of C-Command finds a natural cross-linguistic formulation only if we adopt an antisymmetric approach to languages with head-final phrase structures. This approach turns out to have consequences for a variety of other problems of syntactic analysis, including the West Germanic Verb (Projection) Raising construction and Heavy NP Shift. This dissertation accounts for the typology of scrambling found in the world’s languages and during periods of language change, and shows that the way in which scrambling is constrained provides insight into basic properties of phrase structure. In addition, it constitutes an extended argument for the autonomy of syntax: while prosodic and pragmatic considerations favor leftward scrambling in a number of contexts, a language’s inventory of functional heads puts a strict upper bound on whether scrambling can respond to these considerations

    Further Results and Analysis of Icelandic Part of Speech Tagging

    Get PDF
    Data driven POS tagging has achieved good performance for English, but can still lag behind linguistic rule based taggers for morphologically complex languages, such as Icelandic. We extend a statistical tagger to handle fine grained tagsets and improve over the best Icelandic POS tagger. Additionally, we develop a case tagger for non-local case and gender decisions. An error analysis of our system suggests future directions. This paper presents further results and analysis to the original work (Dredze and Wallenberg, 2008)

    Attention To People Like You: A Proposal Regarding Neuroendocrine Effects on Linguistic Variation

    Get PDF
    Although the literature on language change has often replicated and discussed a pattern in which female speakers lead in changes that occur below the level of awareness, there is no consensus on why this pattern should arise. Interestingly, recent findings in endocrinology show that differences in prenatal testosterone exposure can impact learning patterns. In the light of these findings, we first present preliminary results consistent with the hypothesis that a biological factor, prenatal exposure to androgens, can have a small, continuous biasing effect on linguistic variation, namely the variable duration of pre-aspiration conditioned by voiceless obstruents in Tyneside English. Second, we propose an explanatory model in which the biological factor—prenatal testosterone exposure—creates subtle bias in how speakers learn linguistic variants and suggest that some reported sex effects are derivative. This model is compatible with the high tendency for females to lead in language change from below (Labov 1990: 206)

    A Part-of-Speech Tagger for Yiddish: First Steps in Tagging the Yiddish Book Center Corpus

    Full text link
    We describe the construction and evaluation of a part-of-speech tagger for Yiddish (the first one, to the best of our knowledge). This is the first step in a larger project of automatically assigning part-of-speech tags and syntactic structure to Yiddish text for purposes of linguistic research. We combine two resources for the current work - an 80K word subset of the Penn Parsed Corpus of Historical Yiddish (PPCHY) (Santorini, 2021) and 650 million words of OCR'd Yiddish text from the Yiddish Book Center (YBC). We compute word embeddings on the YBC corpus, and these embeddings are used with a tagger model trained and evaluated on the PPCHY. Yiddish orthography in the YBC corpus has many spelling inconsistencies, and we present some evidence that even simple non-contextualized embeddings are able to capture the relationships among spelling variants without the need to first "standardize" the corpus. We evaluate the tagger performance on a 10-fold cross-validation split, with and without the embeddings, showing that the embeddings improve tagger performance. However, a great deal of work remains to be done, and we conclude by discussing some next steps, including the need for additional annotated training and test data

    Conditioned Variation: Children Replicate Contrasts, not Parental Variant Rate

    Get PDF
    One of the fundamental questions within developmental sociolinguistics, and language acquisition research more broadly, has to do with children’s reaction to variability in their input or primary linguistic data (e.g. Labov 1989, Yang 2002, Hudson Kam and Newport 2005, Smith et al. 2009, Cournane and Pérez-Leroux 2020). As has been extensively documented, children overgeneralize and regularize both consistent (Marcus et al. 1992) and inconsistent (Hudson Kam and Newport 2005) input. Despite this tendency to go beyond the input, we do expect children to learn their caregivers’ dialect, and they have in fact been known to match the rates of variation found in their environment (Labov 1989, Johnson and White 2019). The literature therefore shows both regularization and matching, but under different circumstances. In this paper, we argue for a third scenario and present a case where children neither regularize nor match their caregiver. Instead, they replicate the systematic contrasts they encounter and regularize within matched conditions. This is what happens in the acquisition of Icelandic Dative Substitution (DS), a stigmatized but widespread instance of grammatically conditioned morphosyntactic variation. We investigated DS in 99 children aged 3–13 and their caregivers (80 dyads) by using forced-choice tasks and grammaticality judgments across multiple items as a proxy for case use. The results show that caregivers’ general DS rate did not predict the rate at which their children selected DS, regardless of age. On the other hand, when analyzing the data within conditioning factors, we found that children replicate the contrasts present in their caregivers’ speech, both at the group and individual level, and that this was in part dependent on age

    Creating a Dual-Purpose Treebank

    Get PDF

    The Meaning of Case: Morphosyntactic Bootstrapping and Icelandic Datives

    Get PDF
    Publisher's version (útgefin grein)Do children use the same resources to learn verb meaning across languages? One approach to language acquisition in which universality has been extensively debated is the syntactic bootstrapping hypothesis, which proposes that children use the argument structure of a verb as a cue to its meaning (Landau & Gleitman 1985, Gleitman 1990, Naigles et al. 1993). In recent years, the extent to which verbal morphology and morphosyntax can be informative of verb semantics has been the subject of cross-linguistic research, with one of the primary questions being whether possibly (syntactic) universal cues have an advantage over language-specific (morphological) ones (e.g. Lidz et al. 2003, Göksun et al. 2008, Matsuo et al. 2012, Trueswell et al. 2012 and Leischner et al. 2016). Using corpora and experimental acquisition data from Icelandic, a language with almost no argument-drop and rich case morphology, we provide qualified support for a morphosyntactic bootstrapping account that does not exclusively rely on universal cues, since a learning model detects the available systematic mappings of form and meaning (Yang 2016). In specific contexts, we argue that morphology can be as salient as the number of arguments. Additionally, we argue that experimental comprehension results show the necessary basis for the well-documented productivity of the Icelandic non-default dative (Maling 2002, Svenonius 2002, Jónsson and Eythórsson 2005, Ingason 2010 and Barðdal 2011 i.a.). Specifically, we show that non-default subject case marking rules can be accounted for with Yang’s (2016) Tolerance Principle (TP).Lidz et al. (2003), based on ideas of universal syntax-semantics mapping, argued that children initially rely on argument number and ignore morphological form to bootstrap verb meaning, even when the morphology provides stronger cues. This has been challenged from various perspectives, one of them being typological evidence against the universality of argument structure cues (Brown & Bowerman 2008).Still, even work on argument-drop languages such as Japanese and Turkish reveals that children use syntactic frames as cues – in addition to e.g. case morphology (Göksun et al. 2008 and Matsuo et al. 2012). Furthermore, research on German (Leischner et al. 2016) shows that children rely less on the number of arguments and more on case when word order is highly flexible. But what about languages that do not drop arguments and have a relatively rigid word order (like English) but still have a rich morphological case system (like Turkish)? Icelandic is such a language, with robust semantically driven dative productivity in subject and object case, and also well-documented links between case and lexical semantics (e.g. Jónsson 1997–1998, Maling 2002, Svenonius 2002 and Barðdal 2008).The Icelandic Research Fun (RANNÍS 162991)Peer reviewe
    corecore