Search CORE

9,812 research outputs found

Attention Is All You Need

Author: Gomez Aidan N.
Jones Llion
Kaiser Lukasz
Parmar Niki
Polosukhin Illia
Shazeer Noam
Uszkoreit Jakob
Vaswani Ashish
Publication venue
Publication date: 05/12/2017
Field of study

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.Comment: 15 pages, 5 figure

arXiv.org e-Print Archive

Image-based Text Classification using 2D Convolutional Neural Networks

Author: Chen Liming
Geist Matthieu
Giakoumis Dimitrios
Hamzaoui Raouf
Hanke Sten
Kalatzis Dimitrios
Kropf Johannes
Merdivan Erinç
Tzovaras Dimitrios
Vafeiadis Anastasios
Votis Konstantinos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/05/2019
Field of study

We propose a new approach to text classification in which we consider the input text as an image and apply 2D Convolutional Neural Networks to learn the local and global semantics of the sentences from the variations of the visual patterns of words. Our approach demonstrates that it is possible to get semantically meaningful features from images with text without using optical character recognition and sequential processing pipelines, techniques that traditional natural language processing algorithms require. To validate our approach, we present results for two applications: text classification and dialog modeling. Using a 2D Convolutional Neural Network, we were able to outperform the state-ofart accuracy results for a Chinese text classification task and achieved promising results for seven English text classification tasks. Furthermore, our approach outperformed the memory networks without match types when using out of vocabulary entities from Task 4 of the bAbI dialog dataset

Crossref

INRIA a CCSD electronic archive server

HAL-INSU

De Montfort University Open Research Archive

SARDSRN: A NEURAL NETWORK SHIFT-REDUCE PARSER

Author: Mayberry Marshall R.
Miikkulainen Risto
Publication venue
Publication date: 01/01/1999
Field of study

Simple Recurrent Networks (SRNs) have been widely used in natural language tasks. SARDSRN extends the SRN by explicitly representing the input sequence in a SARDNET self-organizing map. The distributed SRN component leads to good generalization and robust cognitive properties, whereas the SARDNET map provides exact representations of the sentence constituents. This combination allows SARDSRN to learn to parse sentences with more complicated structure than can the SRN alone, and suggests that the approach could scale up to realistic natural language

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive