Search CORE

12 research outputs found

Character-Level Language Modeling with Deeper Self-Attention

Author: Al-Rfou Rami
Choe Dokook
Constant Noah
Guo Mandy
Jones Llion
Publication venue
Publication date: 10/12/2018
Field of study

LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks: 1.13 bits per character on text8 and 1.06 on enwik8. To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.Comment: 8 pages, 7 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Agweddau ar waith T.H. Parry-Willliams

Author: Jones Llion
Publication venue
Publication date: 01/01/1988
Field of study

Aberystwyth Research Portal

Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation

Author: Gutkin Alexander
Ishikawa Haruko
Jones Llion
Sproat Richard
Publication venue: 'MIT Press - Journals'
Publication date: 18/10/2022
Field of study

If one sees the place name Houston Mercer Dog Run in New York, how does one know how to pronounce it? Assuming one knows that Houston in New York is pronounced "how-ston" and not like the Texas city, then one can probably guess that "how-ston" is also used in the name of the dog park. We present a novel architecture that learns to use the pronunciations of neighboring names in order to guess the pronunciation of a given target feature. Applied to Japanese place names, we demonstrate the utility of the model to finding and proposing corrections for errors in Google Maps. To demonstrate the utility of this approach to structurally similar problems, we also report on an application to a totally different task: Cognate reflex prediction in comparative historical linguistics. A version of the code has been open-sourced (https://github.com/google-research/google-research/tree/master/cognate_inpaint_neighbors).Comment: 16 pages, to appear Transactions of the Association for Computational Linguistic

arXiv.org e-Print Archive

Attention Is All You Need

Author: Gomez Aidan N.
Jones Llion
Kaiser Lukasz
Parmar Niki
Polosukhin Illia
Shazeer Noam
Uszkoreit Jakob
Vaswani Ashish
Publication venue
Publication date: 05/12/2017
Field of study

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.Comment: 15 pages, 5 figure

arXiv.org e-Print Archive

Implicit detection of poetic harmony by the naïve brain

Author: Jones Llion
Jones Manon
Lynch Peredur
Thierry Guillaume
Trefor Robat
Vaughan-Evans Awel
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2016
Field of study

The power of poetry is universally acknowledged, but it is debatable whether its appreciation is reserved for experts. Here we show that readers with no particular knowledge of a traditional form of Welsh poetry unconsciously distinguish phrases conforming to its complex poetic construction rules from those that violate them. We studied the brain response of native speakers of Welsh as they read meaningful sentences ending in a word that either complied with strict poetic construction rules, violated rules of consonantal repetition, violated stress pattern, or violated both these constraints. Upon reading the last word of each sentence, participants indicated sentence acceptability. As expected, our inexperienced participants did not explicitly distinguish between sentences that conformed to the poetic rules from those that violated them. However, in the case of orthodox sentences, the critical word elicited a distinctive brain response characteristic of target detection –the P3b– as compared to the other conditions, showing that speakers of Welsh with no expertise of this particular form of poetry implicitly detect poetic harmony. These results show for the first time that before we even consider literal meaning, the musical properties of poetry speak to the human mind in ways that escape consciousness

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Bangor University Research Portal