71 research outputs found

    What Syntactic Structures block Dependencies in RNN Language Models?

    Get PDF
    Recurrent Neural Networks (RNNs) trained on a language modeling task have been shown to acquire a number of non-local grammatical dependencies with some success. Here, we provide new evidence that RNN language models are sensitive to hierarchical syntactic structure by investigating the filler--gap dependency and constraints on it, known as syntactic islands. Previous work is inconclusive about whether RNNs learn to attenuate their expectations for gaps in island constructions in particular or in any sufficiently complex syntactic environment. This paper gives new evidence for the former by providing control studies that have been lacking so far. We demonstrate that two state-of-the-art RNN models are are able to maintain the filler--gap dependency through unbounded sentential embeddings and are also sensitive to the hierarchical relationship between the filler and the gap. Next, we demonstrate that the models are able to maintain possessive pronoun gender expectations through island constructions---this control case rules out the possibility that island constructions block all information flow in these networks. We also evaluate three untested islands constraints: coordination islands, left branch islands, and sentential subject islands. Models are able to learn left branch islands and learn coordination islands gradiently, but fail to learn sentential subject islands. Through these controls and new tests, we provide evidence that model behavior is due to finer-grained expectations than gross syntactic complexity, but also that the models are conspicuously un-humanlike in some of their performance characteristics.Comment: To Appear at the 41st Annual Meeting of the Cognitive Science Society, Montreal, Canada, July 201

    Which Presuppositions are Subject to Contextual Felicity Constraints?

    Get PDF
    Some sentences with presupposition triggers can be felicitously uttered when their presuppositions are not entailed by the context, whereas others are infelicitous in such environments, a phenomenon known as Missing Accommodation / Informative Presupposition or varying Contextual Felicity Constraints (CFCs). Despite an abundance of recent quantitative work on presuppositions, this aspect of their behavior has received less attention via experimentation. Here, we present the results from a semantic rating study testing the relative CFC strength of thirteen presupposition triggers, making this the largest cross-trigger comparison reported in the literature to date. The results support a three-way categorical analysis of presupposition triggers, based on imposing strong, weak, or no CFCs. We observe that strong CFC triggers are all focus-associating, suggesting that (at least some of the) variation in behavior arises due to naturally-occurring semantic classes. We compare our results to three previous proposals for CFC variation and argue that none yet account for the full empirical picture

    Investigating Novel Verb Learning in BERT: Selectional Preference Classes and Alternation-Based Syntactic Generalization

    Full text link
    Previous studies investigating the syntactic abilities of deep learning models have not targeted the relationship between the strength of the grammatical generalization and the amount of evidence to which the model is exposed during training. We address this issue by deploying a novel word-learning paradigm to test BERT's few-shot learning capabilities for two aspects of English verbs: alternations and classes of selectional preferences. For the former, we fine-tune BERT on a single frame in a verbal-alternation pair and ask whether the model expects the novel verb to occur in its sister frame. For the latter, we fine-tune BERT on an incomplete selectional network of verbal objects and ask whether it expects unattested but plausible verb/object pairs. We find that BERT makes robust grammatical generalizations after just one or two instances of a novel word in fine-tuning. For the verbal alternation tests, we find that the model displays behavior that is consistent with a transitivity bias: verbs seen few times are expected to take direct objects, but verbs seen with direct objects are not expected to occur intransitively.Comment: Accepted to BlackboxNLP 202

    Reciprocal Inhibition of Adiponectin and Innate Lung Immune Responses to Chitin and Aspergillus fumigatus

    Get PDF
    Chitin is a structural biopolymer found in numerous organisms, including pathogenic fungi, and recognized as an immune-stimulating pathogen associated molecular pattern by pattern recognition molecules of the host immune system. However, programming and regulation of lung innate immunity to chitin inhalation in the context of inhalation of fungal pathogens such as Aspergillus fumigatus is complex and our understanding incomplete. Here we report that the systemic metabolism-regulating cytokine adiponectin is decreased in the lungs and serum of mice after chitin inhalation, with a concomitant decrease in surface expression of the adiponectin receptor AdipoR1 on lung leukocytes. Constitutive lung expression of acidic mammalian chitinase resulted in decreased inflammatory cytokine gene expression and neutrophil recruitment, but did not significantly affect lung adiponectin transcription. Exogenous recombinant adiponectin specifically dampened airway chitin-mediated eosinophil recruitment, while adiponectin deficiency resulted in increased airway eosinophils. The presence of adiponectin also resulted in decreased CCL11-mediated migration of bone marrow-derived eosinophils. In contrast to purified chitin, aspiration of viable conidia from the high chitin-expressing A. fumigatus isolate Af5517 resulted in increased neutrophil recruitment and inflammatory cytokine gene expression in adiponectin-deficient mice, while no significant changes were observed in response to the isolate Af293. Our results identify a novel role for the adiponectin pathway in inhibition of lung inflammatory responses to chitin and A. fumigatus inhalation

    Revisiting the Optimality of Word Lengths

    Full text link
    Zipf (1935) posited that wordforms are optimized to minimize utterances' communicative costs. Under the assumption that cost is given by an utterance's length, he supported this claim by showing that words' lengths are inversely correlated with their frequencies. Communicative cost, however, can be operationalized in different ways. Piantadosi et al. (2011) claim that cost should be measured as the distance between an utterance's information rate and channel capacity, which we dub the channel capacity hypothesis (CCH) here. Following this logic, they then proposed that a word's length should be proportional to the expected value of its surprisal (negative log-probability in context). In this work, we show that Piantadosi et al.'s derivation does not minimize CCH's cost, but rather a lower bound, which we term CCH-lower. We propose a novel derivation, suggesting an improved way to minimize CCH's cost. Under this method, we find that a language's word lengths should instead be proportional to the surprisal's expectation plus its variance-to-mean ratio. Experimentally, we compare these three communicative cost functions: Zipf's, CCH-lower , and CCH. Across 13 languages and several experimental settings, we find that length is better predicted by frequency than either of the other hypotheses. In fact, when surprisal's expectation, or expectation plus variance-to-mean ratio, is estimated using better language models, it leads to worse word length predictions. We take these results as evidence that Zipf's longstanding hypothesis holds.Comment: Published at EMNLP 202

    On the Effect of Anticipation on Reading Times

    Full text link
    Over the past two decades, numerous studies have demonstrated how less predictable (i.e., higher surprisal) words take more time to read. In general, these studies have implicitly assumed the reading process is purely responsive: Readers observe a new word and allocate time to process it as required. We argue that prior results are also compatible with a reading process that is at least partially anticipatory: Readers could make predictions about a future word and allocate time to process it based on their expectation. In this work, we operationalize this anticipation as a word's contextual entropy. We assess the effect of anticipation on reading by comparing how well surprisal and contextual entropy predict reading times on four naturalistic reading datasets: two self-paced and two eye-tracking. Experimentally, across datasets and analyses, we find substantial evidence for effects of contextual entropy over surprisal on a word's reading time (RT): in fact, entropy is sometimes better than surprisal in predicting a word's RT. Spillover effects, however, are generally not captured by entropy, but only by surprisal. Further, we hypothesize four cognitive mechanisms through which contextual entropy could impact RTs -- three of which we are able to design experiments to analyze. Overall, our results support a view of reading that is not just responsive, but also anticipatory.Comment: This is a pre-MIT Press publication version of the paper. Code is available in https://github.com/rycolab/anticipation-on-reading-time

    On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior

    Get PDF
    Human reading behavior is tuned to the statistics of natural language: the time it takes human subjects to read a word can be predicted from estimates of the word's probability in context. However, it remains an open question what computational architecture best characterizes the expectations deployed in real time by humans that determine the behavioral signatures of reading. Here we test over two dozen models, independently manipulating computational architecture and training dataset size, on how well their next-word expectations predict human reading time behavior on naturalistic text corpora. We find that across model architectures and training dataset sizes the relationship between word log-probability and reading time is (near-)linear. We next evaluate how features of these models determine their psychometric predictive power, or ability to predict human reading behavior. In general, the better a model's next-word expectations, the better its psychometric predictive power. However, we find nontrivial differences across model architectures. For any given perplexity, deep Transformer models and n-gram models generally show superior psychometric predictive power over LSTM or structurally supervised neural models, especially for eye movement data. Finally, we compare models' psychometric predictive power to the depth of their syntactic knowledge, as measured by a battery of syntactic generalization tests developed using methods from controlled psycholinguistic experiments. Once perplexity is controlled for, we find no significant relationship between syntactic knowledge and predictive power. These results suggest that different approaches may be required to best model human real-time language comprehension behavior in naturalistic reading versus behavior for controlled linguistic materials designed for targeted probing of syntactic knowledge.Comment: To Appear at CogSci 202
    • …
    corecore