25 research outputs found

    Mandarin \u27even\u27, `all\u27 and the Trigger of Focus Movement

    Get PDF
    This article proposes a syntax for Mandarin even/all constructions. We show that “focus movement” under ‘even’ is not deeply connected to semantic focus or stress, since the same movement occurs in the absence of focus or prosodic triggers. Rather, these movements are mediated by a feature shared across ‘even’ and ‘all’ constructions, which we propose is the maximality feature on a potentially covert operator. This result, when placed alongside findings by Horvath (2007) and Cable (2007), supports the hypothesis that A-bar “focus movement” is always operator-driven. The syntactic similarities between ‘even’ and ‘all’ in Mandarin suggest a semantics where ‘even’ is built compositionally from a non-focus-sensitive ‘all’ (dou) plus a scalar focus operator (lian). We present a preliminary semantics of this kind, and discuss some challenges it faces. Finally, we address “partial focus movement” data that are initially unexpected on our account, and show how they can be incorporated under a framework that allows copy movement and PF deletion

    Character-Level Language Modeling with Deeper Self-Attention

    Full text link
    LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks: 1.13 bits per character on text8 and 1.06 on enwik8. To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.Comment: 8 pages, 7 figure

    Witnessable quantifiers license type-e meaning: Evidence from contrastive topic, equatives and supplements

    Get PDF
    This paper presents three novel ways of testing which plural quantificational phrases can denote individuals (type e). Specifically, it is argued that only type-e expressions can (i) be marked as a contrastive topic in a discourse contrasting individuals, (ii) be equated with another type-e expression in an equative frame, and (iii) anchor supplementing material. The main empirical finding is that the class of quantifiers allowing type-e nominal denotations is larger than assumed on classic accounts like Reinhart 1997. Furthermore, this class is characterizable in semantic terms. The quantifiers that give rise to type-e meanings are "witnessable" in the sense of entailing the existence of an individual satisfying both their restrictor and their nuclear scope

    UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining

    Full text link
    Pretrained multilingual large language models have typically used heuristic temperature-based sampling to balance between different languages. However previous work has not systematically evaluated the efficacy of different pretraining language distributions across model scales. In this paper, we propose a new sampling method, UniMax, that delivers more uniform coverage of head languages while mitigating overfitting on tail languages by explicitly capping the number of repeats over each language's corpus. We perform an extensive series of ablations testing a range of sampling strategies on a suite of multilingual benchmarks, while varying model scale. We find that UniMax outperforms standard temperature-based sampling, and the benefits persist as scale increases. As part of our contribution, we release: (i) an improved and refreshed mC4 multilingual corpus consisting of 29 trillion characters across 107 languages, and (ii) a suite of pretrained umT5 model checkpoints trained with UniMax sampling
    corecore