12 research outputs found

    Character-Level Language Modeling with Deeper Self-Attention

    Full text link
    LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks: 1.13 bits per character on text8 and 1.06 on enwik8. To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.Comment: 8 pages, 7 figure

    Agweddau ar waith T.H. Parry-Willliams

    Get PDF

    Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation

    Full text link
    If one sees the place name Houston Mercer Dog Run in New York, how does one know how to pronounce it? Assuming one knows that Houston in New York is pronounced "how-ston" and not like the Texas city, then one can probably guess that "how-ston" is also used in the name of the dog park. We present a novel architecture that learns to use the pronunciations of neighboring names in order to guess the pronunciation of a given target feature. Applied to Japanese place names, we demonstrate the utility of the model to finding and proposing corrections for errors in Google Maps. To demonstrate the utility of this approach to structurally similar problems, we also report on an application to a totally different task: Cognate reflex prediction in comparative historical linguistics. A version of the code has been open-sourced (https://github.com/google-research/google-research/tree/master/cognate_inpaint_neighbors).Comment: 16 pages, to appear Transactions of the Association for Computational Linguistic

    Attention Is All You Need

    Full text link
    The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.Comment: 15 pages, 5 figure

    Implicit detection of poetic harmony by the naïve brain

    Get PDF
    The power of poetry is universally acknowledged, but it is debatable whether its appreciation is reserved for experts. Here we show that readers with no particular knowledge of a traditional form of Welsh poetry unconsciously distinguish phrases conforming to its complex poetic construction rules from those that violate them. We studied the brain response of native speakers of Welsh as they read meaningful sentences ending in a word that either complied with strict poetic construction rules, violated rules of consonantal repetition, violated stress pattern, or violated both these constraints. Upon reading the last word of each sentence, participants indicated sentence acceptability. As expected, our inexperienced participants did not explicitly distinguish between sentences that conformed to the poetic rules from those that violated them. However, in the case of orthodox sentences, the critical word elicited a distinctive brain response characteristic of target detection –the P3b– as compared to the other conditions, showing that speakers of Welsh with no expertise of this particular form of poetry implicitly detect poetic harmony. These results show for the first time that before we even consider literal meaning, the musical properties of poetry speak to the human mind in ways that escape consciousness
    corecore