14 research outputs found

    A Neural Network Approach for Mixing Language Models

    Full text link
    The performance of Neural Network (NN)-based language models is steadily improving due to the emergence of new architectures, which are able to learn different natural language characteristics. This paper presents a novel framework, which shows that a significant improvement can be achieved by combining different existing heterogeneous models in a single architecture. This is done through 1) a feature layer, which separately learns different NN-based models and 2) a mixture layer, which merges the resulting model features. In doing so, this architecture benefits from the learning capabilities of each model with no noticeable increase in the number of model parameters or the training time. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art feedforward as well as recurrent neural network architectures.Comment: Published at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017. arXiv admin note: text overlap with arXiv:1703.0806

    Character-Aware Neural Language Models

    Full text link
    We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.Comment: AAAI 201

    Fixed Size Ordinally-Forgetting Encoding and its Applications

    Get PDF
    In this thesis, we propose the new Fixed-size Ordinally-Forgetting Encoding (FOFE) method, which can almost uniquely encode any variable-length sequence of words into a fixed-size representation. FOFE can model the word order in a sequence using a simple ordinally-forgetting mechanism according to the positions of words. We address two fundamental problems in natural language processing, namely, Language Modeling (LM) and Named Entity Recognition (NER). We have applied FOFE to FeedForward Neural Network Language Models (FFNN-LMs). Experimental results have shown that without using any recurrent feedbacks, FOFE-FFNN-LMs significantly outperform not only the standard fixed-input FFNN-LMs but also some popular Recurrent Neural Network Language Models (RNN-LMs). Instead of treating NER as a sequence labeling problem, we propose a new local detection approach, which relies on FOFE to fully encode each sentence fragment and its left/right contexts into a fixed-size representation. This local detection approach has shown many advantages over the traditional sequence labeling methods. Our method has yielded pretty strong performance in all tasks we have examined
    corecore