9,812 research outputs found
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks in an encoder-decoder configuration. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer, based
solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these models to be
superior in quality while being more parallelizable and requiring significantly
less time to train. Our model achieves 28.4 BLEU on the WMT 2014
English-to-German translation task, improving over the existing best results,
including ensembles by over 2 BLEU. On the WMT 2014 English-to-French
translation task, our model establishes a new single-model state-of-the-art
BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction
of the training costs of the best models from the literature. We show that the
Transformer generalizes well to other tasks by applying it successfully to
English constituency parsing both with large and limited training data.Comment: 15 pages, 5 figure
Image-based Text Classification using 2D Convolutional Neural Networks
We propose a new approach to text classification
in which we consider the input text as an image and apply
2D Convolutional Neural Networks to learn the local and
global semantics of the sentences from the variations of the
visual patterns of words. Our approach demonstrates that
it is possible to get semantically meaningful features from
images with text without using optical character recognition
and sequential processing pipelines, techniques that traditional
natural language processing algorithms require. To validate
our approach, we present results for two applications: text
classification and dialog modeling. Using a 2D Convolutional
Neural Network, we were able to outperform the state-ofart
accuracy results for a Chinese text classification task and
achieved promising results for seven English text classification
tasks. Furthermore, our approach outperformed the memory
networks without match types when using out of vocabulary
entities from Task 4 of the bAbI dialog dataset
SARDSRN: A NEURAL NETWORK SHIFT-REDUCE PARSER
Simple Recurrent Networks (SRNs) have been widely used in natural language tasks. SARDSRN extends the SRN by
explicitly representing the input sequence in a SARDNET self-organizing map. The distributed SRN component leads to good generalization and robust cognitive properties, whereas the SARDNET map provides exact representations of the sentence constituents. This combination allows SARDSRN to learn to parse sentences with more complicated structure than can the SRN alone, and suggests that the approach could scale up to realistic natural language
- …