Search CORE

4,218 research outputs found

Word Embeddings: A Survey

Author: Almeida Felipe
Xexéo Geraldo
Publication venue
Publication date: 25/01/2019
Field of study

This work lists and describes the main recent strategies for building fixed-length, dense and distributed representations for words, based on the distributional hypothesis. These representations are now commonly called word embeddings and, in addition to encoding surprisingly good syntactic and semantic information, have been proven useful as extra features in many downstream NLP tasks.Comment: 10 pages, 2 tables, 1 imag

arXiv.org e-Print Archive

A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Author: Klakow Dietrich
Oualil Youssef
Publication venue
Publication date: 22/08/2017
Field of study

Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time with no noticeable degradation of the models performance. This paper also presents a new baseline comparative study of different standard NNLMs on the large OBWB on a single Titan-X GPU.Comment: Accepted for publication at INTERSPEECH'1

arXiv.org e-Print Archive

Crossref

A Simple Language Model based on PMI Matrix Approximations

Author: Dagan Ido
Goldberger Jacob
Melamud Oren
Publication venue
Publication date: 01/01/2017
Field of study

In this study, we introduce a new approach for learning language models by training them to estimate word-context pointwise mutual information (PMI), and then deriving the desired conditional probabilities from PMI at test time. Specifically, we show that with minor modifications to word2vec's algorithm, we get principled language models that are closely related to the well-established Noise Contrastive Estimation (NCE) based language models. A compelling aspect of our approach is that our models are trained with the same simple negative sampling objective function that is commonly used in word2vec to learn word embeddings.Comment: Accepted to EMNLP 201

arXiv.org e-Print Archive

Crossref

Importance Sampling for Objetive Funtion Estimations in Neural Detector Traing Driven by Genetic Algorithms

Author: C Andrieu
C Yuan
E Atanassov
GC Orsak
GC Orsak
H Hachiya
HL Van Trees
HV Poor
JC Chen
JL Sanz-González
JL Sanz-González
JL Sanz-González
José L. Sanz-González
K Gerlach
LF Hoogerheide
LF Hoogerheide
M Denny
M. Pilar Jarabo-Amores
Manuel Rosa-Zurera
MP Jarabo-Amores
PH Borcherds
PJ Smith
PP Gandhi
Raúl Vicen-Bueno
RJ Wolfe
S Haykin
Saturnino Maldonado-Bascón
WA Al-Qaq
WA Al-Qaq
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

To train Neural Networks (NNs) in a supervised way, estimations of an objective function must be carried out. The value of this function decreases as the training progresses and so, the number of test observations necessary for an accurate estimation has to be increased. Consequently, the training computational cost is unaffordable for very low objective function value estimations, and the use of Importance Sampling (IS) techniques becomes convenient. The study of three different objective functions is considered, which implies the proposal of estimators of the objective function using IS techniques: the Mean-Square error, the Cross Entropy error and the Misclassification error criteria. The values of these functions are estimated by IS techniques, and the results are used to train NNs by the application of Genetic Algorithms. Results for a binary detection in Gaussian noise are provided. These results show the evolution of the parameters during the training and the performances of the proposed detectors in terms of error probability and Receiver Operating Characteristics curves. At the end of the study, the obtained results justify the convenience of using IS in the training

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM