Search CORE

51,926 research outputs found

Hierarchical Distributed Representations for Statistical Language Modeling

Author: Blitzer John
Pereira Fernando C.N.
Saul Lawrence K
Weinberger Kilian Q
Publication venue: ScholarlyCommons
Publication date: 13/12/2004
Field of study

Statistical language models estimate the probability of a word occurring in a given context. The most common language models rely on a discrete enumeration of predictive contexts (e.g., n-grams) and consequently fail to capture and exploit statistical regularities across these contexts. In this paper, we show how to learn hierarchical, distributed representations of word contexts that maximize the predictive value of a statistical language model. The representations are initialized by unsupervised algorithms for linear and nonlinear dimensionality reduction [14], then fed as input into a hierarchical mixture of experts, where each expert is a multinomial distribution over predicted words [12]. While the distributed representations in our model are inspired by the neural probabilistic language model of Bengio et al. [2, 3], our particular architecture enables us to work with significantly larger vocabularies and training corpora. For example, on a large-scale bigram modeling task involving a sixty thousand word vocabulary and a training corpus of three million sentences, we demonstrate consistent improvement over class-based bigram models [10, 13]. We also discuss extensions of our approach to longer multiword contexts

DeepWalk: Online Learning of Social Representations

Author: Al-Rfou R.
Bottou L.
Dean J.
Hinton G. E.
Kondor R. I.
Krizhevsky A.
Macskassy S. A.
Mikolov T.
Mikolov T.
Morin F.
Neville J.
Recht B.
Vishwanathan S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/06/2014
Field of study

We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide

F_1

scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk's representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table

arXiv.org e-Print Archive

Distributed Representations of Words and Phrases and their Compositionality

Author: Google Inc
Google Inc
Greg Corrado
Ilya Sutskever
Jeffrey Dean
Kai Chen
Tomas Mikolov
Publication venue
Publication date: 16/10/2013
Field of study

The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible

arXiv.org e-Print Archive

CiteSeerX

Efficient Estimation of Word Representations in Vector Space

Author: Chen Kai
Corrado Greg
Dean Jeffrey
Mikolov Tomas
Publication venue
Publication date: 06/09/2013
Field of study

We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities

arXiv.org e-Print Archive

CiteSeerX