32 research outputs found

    Mechanisms Of Phonological Change

    Get PDF
    The traditional Philadelphia allophonic /æ/ system (henceforth: PHL shown in (1) below) is characterized by a set of complicated conditioning factors and a dramatic acoustic distinction between the two allophones. In recent years, some Philadelphians have begun to exhibit a new allophonic system (NAS, shown in (2) below). Like PHL, NAS is characterized by a dramatic acoustic distinction between tense and lax allophones. NAS is quickly overtaking PHL in the Philadelphia community, as demonstrated by Labov et al. (2016). (1) PHL: æ→æh/ _ [+anterior] ∩( [+nasal] ∪ [-voice + fricative) ]σ (2) NAS: æ→æh/ +nasal This situation offers an exciting opportunity to observe phonological change in individual speakers. Most phonological changes involve the collapse or creation of a new phonological category; because of the large degree of acoustic overlap in these situations, it is difficult or impossible to identity individual tokens as having been produced by the old or the new phonology. In the current change in Philadelphia /æ/, however, both the old and the new system involve distinct acoustic targets, making it possible to identify which underlying system was used to produce a given word. It is therefore possible to test several distinct theories about phonological change: Whether change occurs through gradual phonetic incrementation (e.g. Ohala 1981), through individual speakers producing only the old or the new system (e.g., Janda and Joseph 2003), or whether change occurs via individual speakers probabilistically producing both the old and the new system in a process of individual grammar competition (e.g., Fruehwald et al. 2013). In my dissertation, I examine natural speech production from 46 speakers who acquired language during the period of allophonic change, with a combination of topic-directed conversations and targeted natural language experiments. Using a glm classifier, I identify tokens of /æ/ as having been produced by either PHL or NAS. In concert with an analysis of speakers’ social histories, I use these results to argue that the change from PHL to NAS in Philadelphia is driven by the mechanism of competing grammars, suggesting that both syntactic change and phonological change proceed in the same manner. My research provides one of the first pieces of direct empirical support for a unified theory of language change in which structural changes in syntax and phonology are implemented through the same mechanism of grammar competition (Kroch, 1989; Fruehwald et al., 2013). In addition to the theoretical contribution to phonological change, my dissertation also traces the social patterns of the allophonic change, highlighting the effect of network structure and access to elite education on the adoption of the incoming allophonic system. I also employ experimental methods to demonstrate that the abstract allophonic rules of /æ/ are the target of social evaluation and contribute to social meaning. I find speakers producing surprisingly systematic evaluations of PHL and NAS, a result which only emerges when analyzing the evaluation of changing abstract parameters. Finally, to test whether the change from PHL to NAS was the inevitable result of phono- logical simplification, I developed a computational simulation built using a principle of language acquisition (Yang, 2016) to demonstrate that the allophonic restructuring in /æ/ was not the result of children simplifying their input data, but rather must have been the result of dialect contact with in-moving speakers of the new system

    Linguistic Productivity and Recurrent Neural Networks

    Get PDF

    Predictability effects in language acquisition

    Get PDF
    Human language has two fundamental requirements: it must allow competent speakers to exchange messages efficiently, and it must be readily learned by children. Recent work has examined effects of language predictability on language production, with many researchers arguing that so-called “predictability effects” function towards the efficiency requirement. Specifically, recent work has found that talkers tend to reduce linguistic forms that are more probable more heavily. This dissertation proposes the “Predictability Bootstrapping Hypothesis” that predictability effects also make language more learnable. There is a great deal of evidence that the adult grammars have substantial statistical components. Since predictability effects result in heavier reduction for more probable words and hidden structure, they provide infants with direct cues to the statistical components of the grammars they are trying to learn. The corpus studies and computational modeling experiments in this dissertation show that predictability effects could be a substantial source of information to language-learning infants, focusing on the potential utility of phonetic reduction in terms of word duration for syntax acquisition. First, corpora of spontaneous adult-directed and child-directed speech (ADS and CDS, respectively) are compared to verify that predictability effects actually exist in CDS. While revealing some differences, mixed effects regressions on those corpora indicate that predictability effects in CDS are largely similar (in kind and magnitude) to predictability effects in ADS. This result indicates that predictability effects are available to infants, however useful they may be. Second, this dissertation builds probabilistic, unsupervised, and lexicalized models for learning about syntax from words and durational cues. One series of models is based on Hidden Markov Models and learns shallow constituency structure, while the other series is based on the Dependency Model with Valence and learns dependency structure. These models are then used to measure how useful durational cues are for syntax acquisition, and to what extent their utility in this task can be attributed to effects of syntactic predictability on word duration. As part of this investigation, these models are also used to explore the venerable “Prosodic Bootstrapping Hypothesis” that prosodic structure, which is cued in part by word duration, may be useful for syntax acquisition. The empirical evaluations of these models provide evidence that effects of syntactic predictability on word duration are easier to discover and exploit than effects of prosodic structure, and that even gold-standard annotations of prosodic structure provide at most a relatively small improvement in parsing performance over raw word duration. Taken together, this work indicates that predictability effects provide useful information about syntax to infants, showing that the Predictability Bootstrapping Hypothesis for syntax acquisition is computationally plausible and motivating future behavioural investigation. Additionally, as talkers consider the probability of many different aspects of linguistic structure when reducing according to predictability effects, this result also motivates investigation of Predictability Bootstrapping of other aspects of linguistic knowledge

    Representation and learning schemes for argument stance mining.

    Get PDF
    Argumentation is a key part of human interaction. Used introspectively, it searches for the truth, by laying down argument for and against positions. As a mediation tool, it can be used to search for compromise between multiple human agents. For this purpose, theories of argumentation have been in development since the Ancient Greeks in order to formalise the process and therefore remove the human imprecision from it. From this practice the process of argument mining has emerged. As human interaction has moved from the small scale of one-to-one (or few-to-few) debates to large scale discussions where tens of thousands of participants can express their opinion in real time, the importance of argument mining has grown while its feasibility in a manual annotation setting has diminished and relied mainly on a human-defined heuristics to process the data. This underlines the importance of a new generation of computational tools that can automate this process on a larger scale. In this thesis we study argument stance detection, one of the steps involved in the argument mining workflow. We demonstrate how we can use data of varying reliability in order to mine argument stance in social media data. We investigate a spectrum of techniques, from completely unsupervised classification of stance using a sentiment lexicon, automated computation of a regularised stance lexicon, automated computation of a lexicon with modifiers, and the use of a lexicon with modifiers as a temporal feature model for more complex classification algorithms. We find that the addition of contextual information enhances unsupervised stance classification, within reason, and that multi-strategy algorithms that combine multiple heuristics by ordering them from the precise to the general tend to outperform other approaches by a large margin. Focusing then on building a stance lexicon, we find that optimising such lexicons using an empirical risk minimisation framework allows us to regularise them to a higher degree than competing probabilistic techniques, which helps us learn better lexicons from noisy data. We also conclude that adding local context (neighbouring words) information during the learning phase of the lexicons tends to produce more accurate results at the cost of robustness, since part of the weights is distributed from the words with a class valence to the contextual words. Finally, when investigating the use of lexicons to build feature models for traditional machine learning techniques, simple lexicons (without context) seem to perform overall as well as more complex ones, and better than purely semantic representations. We also find that word-level feature models tend to outperform sentence and instance-level representations, but that they do not benefit as much from being augmented by lexicon knowledge.This research programme was carried out in collaboration with the University of Glasgow, Department of Computer Science

    Automatic correction of grammatical errors in non-native English text

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 99-107).Learning a foreign language requires much practice outside of the classroom. Computer-assisted language learning systems can help fill this need, and one desirable capability of such systems is the automatic correction of grammatical errors in texts written by non-native speakers. This dissertation concerns the correction of non-native grammatical errors in English text, and the closely related task of generating test items for language learning, using a combination of statistical and linguistic methods. We show that syntactic analysis enables extraction of more salient features. We address issues concerning robustness in feature extraction from non-native texts; and also design a framework for simultaneous correction of multiple error types. Our proposed methods are applied on some of the most common usage errors, including prepositions, verb forms, and articles. The methods are evaluated on sentences with synthetic and real errors, and in both restricted and open domains. A secondary theme of this dissertation is that of user customization. We perform a detailed analysis on a non-native corpus, illustrating the utility of an error model based on the mother tongue. We study the benefits of adjusting the correction models based on the quality of the input text; and also present novel methods to generate high-quality multiple-choice items that are tailored to the interests of the user.by John Sie Yuen Lee.Ph.D

    Online assessment of protein interaction information extraction systems

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 01-03-201
    corecore