Search CORE

25 research outputs found

Mandarin \u27even\u27, `all\u27 and the Trigger of Focus Movement

Author: Constant Noah
Gu Chloe C
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

This article proposes a syntax for Mandarin even/all constructions. We show that “focus movement” under ‘even’ is not deeply connected to semantic focus or stress, since the same movement occurs in the absence of focus or prosodic triggers. Rather, these movements are mediated by a feature shared across ‘even’ and ‘all’ constructions, which we propose is the maximality feature on a potentially covert operator. This result, when placed alongside findings by Horvath (2007) and Cable (2007), supports the hypothesis that A-bar “focus movement” is always operator-driven. The syntactic similarities between ‘even’ and ‘all’ in Mandarin suggest a semantics where ‘even’ is built compositionally from a non-focus-sensitive ‘all’ (dou) plus a scalar focus operator (lian). We present a preliminary semantics of this kind, and discuss some challenges it faces. Finally, we address “partial focus movement” data that are initially unexpected on our account, and show how they can be incorporated under a framework that allows copy movement and PF deletion

ScholarlyCommons@Penn

Character-Level Language Modeling with Deeper Self-Attention

Author: Al-Rfou Rami
Choe Dokook
Constant Noah
Guo Mandy
Jones Llion
Publication venue
Publication date: 10/12/2018
Field of study

LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks: 1.13 bits per character on text8 and 1.06 on enwik8. To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.Comment: 8 pages, 7 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recommended from our members

Contrastive Topic: Meanings and Realizations

Author: Constant Noah
Publication venue: ScholarWorks@UMass Amherst
Publication date: 12/11/2014
Field of study

This dissertation develops a theory of contrastive topics (CTs)—what they mean, and how they are realized. I give a compositional semantics for CT constructions, built on the idea that CT marks anaphora to a complex question in the discourse. The account allows us to maintain an inclusive view of what counts as a contrastive topic, making reasonable predictions about sentences with CT phrases of difference types, in various combinations, and across various speech acts. Empirically, the dissertation focuses on contrastive topic marking in English and Mandarin Chinese. In English, CT phrases are typically realized with a “rising” prosody. I offer an explicit model that predicts the intonational features of English sentences containing contrastive topics. In Mandarin, sentences with CTs often exhibit the discourse particle -ne. I provide a detailed description of the particle’s distribution, and offer the first sustained argument that -ne is a CT marker

ScholarWorks@UMass Amherst

Witnessable quantifiers license type-e meaning: Evidence from contrastive topic, equatives and supplements

Author: Constant Noah
Publication venue: 'Linguistic Society of America'
Publication date: 03/09/2012
Field of study

This paper presents three novel ways of testing which plural quantificational phrases can denote individuals (type e). Specifically, it is argued that only type-e expressions can (i) be marked as a contrastive topic in a discourse contrasting individuals, (ii) be equated with another type-e expression in an equative frame, and (iii) anchor supplementing material. The main empirical finding is that the class of quantifiers allowing type-e nominal denotations is larger than assumed on classic accounts like Reinhart 1997. Furthermore, this class is characterizable in semantic terms. The quantifiers that give rise to type-e meanings are "witnessable" in the sense of entailing the existence of an individual satisfying both their restrictor and their nuclear scope

Proceedings Published by the LSA (Linguistic Society of America)

UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining

Author: Chung Hyung Won
Constant Noah
Firat Orhan
Garcia Xavier
Narang Sharan
Roberts Adam
Tay Yi
Publication venue
Publication date: 18/04/2023
Field of study

Pretrained multilingual large language models have typically used heuristic temperature-based sampling to balance between different languages. However previous work has not systematically evaluated the efficacy of different pretraining language distributions across model scales. In this paper, we propose a new sampling method, UniMax, that delivers more uniform coverage of head languages while mitigating overfitting on tail languages by explicitly capping the number of repeats over each language's corpus. We perform an extensive series of ablations testing a range of sampling strategies on a suite of multilingual benchmarks, while varying model scale. We find that UniMax outperforms standard temperature-based sampling, and the benefits persist as scale increases. As part of our contribution, we release: (i) an improved and refreshed mC4 multilingual corpus consisting of 29 trillion characters across 107 languages, and (ii) a suite of pretrained umT5 model checkpoints trained with UniMax sampling

arXiv.org e-Print Archive