25 research outputs found
Mandarin \u27even\u27, `all\u27 and the Trigger of Focus Movement
This article proposes a syntax for Mandarin even/all constructions. We show that âfocus movementâ under âevenâ is not deeply connected to semantic focus or stress, since the same movement occurs in the absence of focus or prosodic triggers. Rather, these movements are mediated by a feature shared across âevenâ and âallâ constructions, which we propose is the maximality feature on a potentially covert operator. This result, when placed alongside findings by Horvath (2007) and Cable (2007), supports the hypothesis that A-bar âfocus movementâ is always operator-driven. The syntactic similarities between âevenâ and âallâ in Mandarin suggest a semantics where âevenâ is built compositionally from a non-focus-sensitive âallâ (dou) plus a scalar focus operator (lian). We present a preliminary semantics of this kind, and discuss some challenges it faces. Finally, we address âpartial focus movementâ data that are initially unexpected on our account, and show how they can be incorporated under a framework that allows copy movement and PF deletion
Character-Level Language Modeling with Deeper Self-Attention
LSTMs and other RNN variants have shown strong performance on character-level
language modeling. These models are typically trained using truncated
backpropagation through time, and it is common to assume that their success
stems from their ability to remember long-term contexts. In this paper, we show
that a deep (64-layer) transformer model with fixed context outperforms RNN
variants by a large margin, achieving state of the art on two popular
benchmarks: 1.13 bits per character on text8 and 1.06 on enwik8. To get good
results at this depth, we show that it is important to add auxiliary losses,
both at intermediate network layers and intermediate sequence positions.Comment: 8 pages, 7 figure
Recommended from our members
Contrastive Topic: Meanings and Realizations
This dissertation develops a theory of contrastive topics (CTs)âwhat they mean, and how they are realized. I give a compositional semantics for CT constructions, built on the idea that CT marks anaphora to a complex question in the discourse. The account allows us to maintain an inclusive view of what counts as a contrastive topic, making reasonable predictions about sentences with CT phrases of difference types, in various combinations, and across various speech acts. Empirically, the dissertation focuses on contrastive topic marking in English and Mandarin Chinese. In English, CT phrases are typically realized with a ârisingâ prosody. I offer an explicit model that predicts the intonational features of English sentences containing contrastive topics. In Mandarin, sentences with CTs often exhibit the discourse particle -ne. I provide a detailed description of the particleâs distribution, and offer the first sustained argument that -ne is a CT marker
Witnessable quantifiers license type-e meaning: Evidence from contrastive topic, equatives and supplements
This paper presents three novel ways of testing which plural quantificational phrases can denote individuals (type e). Specifically, it is argued that only type-e expressions can (i) be marked as a contrastive topic in a discourse contrasting individuals, (ii) be equated with another type-e expression in an equative frame, and (iii) anchor supplementing material. The main empirical finding is that the class of quantifiers allowing type-e nominal denotations is larger than assumed on classic accounts like Reinhart 1997. Furthermore, this class is characterizable in semantic terms. The quantifiers that give rise to type-e meanings are "witnessable" in the sense of entailing the existence of an individual satisfying both their restrictor and their nuclear scope
UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining
Pretrained multilingual large language models have typically used heuristic
temperature-based sampling to balance between different languages. However
previous work has not systematically evaluated the efficacy of different
pretraining language distributions across model scales. In this paper, we
propose a new sampling method, UniMax, that delivers more uniform coverage of
head languages while mitigating overfitting on tail languages by explicitly
capping the number of repeats over each language's corpus. We perform an
extensive series of ablations testing a range of sampling strategies on a suite
of multilingual benchmarks, while varying model scale. We find that UniMax
outperforms standard temperature-based sampling, and the benefits persist as
scale increases. As part of our contribution, we release: (i) an improved and
refreshed mC4 multilingual corpus consisting of 29 trillion characters across
107 languages, and (ii) a suite of pretrained umT5 model checkpoints trained
with UniMax sampling