4,659 research outputs found

    Weighted Contrastive Divergence

    Get PDF
    Learning algorithms for energy based Boltzmann architectures that rely on gradient descent are in general computationally prohibitive, typically due to the exponential number of terms involved in computing the partition function. In this way one has to resort to approximation schemes for the evaluation of the gradient. This is the case of Restricted Boltzmann Machines (RBM) and its learning algorithm Contrastive Divergence (CD). It is well-known that CD has a number of shortcomings, and its approximation to the gradient has several drawbacks. Overcoming these defects has been the basis of much research and new algorithms have been devised, such as persistent CD. In this manuscript we propose a new algorithm that we call Weighted CD (WCD), built from small modifications of the negative phase in standard CD. However small these modifications may be, experimental work reported in this paper suggest that WCD provides a significant improvement over standard CD and persistent CD at a small additional computational cost

    Efficient Learning for Undirected Topic Models

    Get PDF
    Replicated Softmax model, a well-known undirected topic model, is powerful in extracting semantic representations of documents. Traditional learning strategies such as Contrastive Divergence are very inefficient. This paper provides a novel estimator to speed up the learning based on Noise Contrastive Estimate, extended for documents of variant lengths and weighted inputs. Experiments on two benchmarks show that the new estimator achieves great learning efficiency and high accuracy on document retrieval and classification.Comment: Accepted by ACL-IJCNLP 2015 short paper. 6 page

    Short-Term Load Forecasting of Natural Gas with Deep Neural Network Regression

    Get PDF
    Deep neural networks are proposed for short-term natural gas load forecasting. Deep learning has proven to be a powerful tool for many classification problems seeing significant use in machine learning fields such as image recognition and speech processing. We provide an overview of natural gas forecasting. Next, the deep learning method, contrastive divergence is explained. We compare our proposed deep neural network method to a linear regression model and a traditional artificial neural network on 62 operating areas, each of which has at least 10 years of data. The proposed deep network outperforms traditional artificial neural networks by 9.83% weighted mean absolute percent error (WMAPE)

    Representation learning for very short texts using weighted word embedding aggregation

    Full text link
    Short text messages such as tweets are very noisy and sparse in their use of vocabulary. Traditional textual representations, such as tf-idf, have difficulty grasping the semantic meaning of such texts, which is important in applications such as event detection, opinion mining, news recommendation, etc. We constructed a method based on semantic word embeddings and frequency information to arrive at low-dimensional representations for short texts designed to capture semantic similarity. For this purpose we designed a weight-based model and a learning procedure based on a novel median-based loss function. This paper discusses the details of our model and the optimization methods, together with the experimental results on both Wikipedia and Twitter data. We find that our method outperforms the baseline approaches in the experiments, and that it generalizes well on different word embeddings without retraining. Our method is therefore capable of retaining most of the semantic information in the text, and is applicable out-of-the-box.Comment: 8 pages, 3 figures, 2 tables, appears in Pattern Recognition Letter

    A comparative study of different gradient approximations for Restricted Boltzmann Machines

    Get PDF
    This project consists of the theoretical study of Restricted Boltzmann Machines(RBMs) and focuses on the gradient approximations of RBMs. RBMs suffer from the dilemma of accurate learning with the exact gradient. Based on Contrastive Divergence(CD) and Markov Chain Monte Carlo(MCMC), CD-k, an efficient algorithm of approximating the gradients, is proposed and now it becomes the mainstream to train RBMs. In order to improve the algorithm efficiency and mitigate the bias existing in the approximation, many CD-related algorithms have emerged afterwards, such as Persistent Contrastive Divergence(PCD) and Weighted Contrastive Divergence(WCD). In this project the comprehensive comparison of the gradient approximation algorithms is presented, mainly including CD, PCD, WCD. The experimental results indicate that among all the conducted algorithms, WCD has the fastest and best convergence for parameter learning. Increasing the Gibbs sampling time and adding a persistent chain in CD-related can enhance the performance and alleviate the bias in the approximation, also taking advantage of Parallel Tempering can further improve the results. Moreover, the cosine similarity of approximating gradients and exact gradients is studied and it proves that CD series algorithms and WCD series algorithms are heterogeneous. The general conclusions in this project can be the reference when training RBMs
    • …
    corecore