Search CORE

387 research outputs found

A Deep Embedding Model for Co-occurrence Learning

Author: Chen Jianshu
Deng Li
Gao Jianfeng
He Xiaodong
Jin Ruoming
Shen Yelong
Publication venue
Publication date: 04/06/2015
Field of study

Co-occurrence Data is a common and important information source in many areas, such as the word co-occurrence in the sentences, friends co-occurrence in social networks and products co-occurrence in commercial transaction data, etc, which contains rich correlation and clustering information about the items. In this paper, we study co-occurrence data using a general energy-based probabilistic model, and we analyze three different categories of energy-based model, namely, the

L_1

L_2

and

L_k

models, which are able to capture different levels of dependency in the co-occurrence data. We also discuss how several typical existing models are related to these three types of energy models, including the Fully Visible Boltzmann Machine (FVBM) (

L_2

), Matrix Factorization (

L_2

), Log-BiLinear (LBL) models (

L_2

), and the Restricted Boltzmann Machine (RBM) model (

L_k

). Then, we propose a Deep Embedding Model (DEM) (an

L_k

model) from the energy model in a \emph{principled} manner. Furthermore, motivated by the observation that the partition function in the energy model is intractable and the fact that the major objective of modeling the co-occurrence data is to predict using the conditional probability, we apply the \emph{maximum pseudo-likelihood} method to learn DEM. In consequence, the developed model and its learning method naturally avoid the above difficulties and can be easily used to compute the conditional probability in prediction. Interestingly, our method is equivalent to learning a special structured deep neural network using back-propagation and a special sampling strategy, which makes it scalable on large-scale datasets. Finally, in the experiments, we show that the DEM can achieve comparable or better results than state-of-the-art methods on datasets across several application domains

arXiv.org e-Print Archive

Crossref

The Neural Autoregressive Distribution Estimator

Author: Larochelle Hugo
Murray Iain
Publication venue
Publication date: 01/01/2011
Field of study

We describe a new approach for modeling the distribution of high-dimensional vectors of discrete variables. This model is inspired by the restricted Boltzmann machine (RBM), which has been shown to be a powerful model of such distributions. However, an RBM typically does not provide a tractable distribution estimator, since evaluating the probability it assigns to some given observation requires the computation of the so-called partition function, which itself is intractable for RBMs of even moderate size. Our model circumvents this difficulty by decomposing the joint distribution of observations into tractable conditional distributions and modeling each conditional using a non-linear function similar to a conditional of an RBM. Our model can also be interpreted as an autoencoder wired such that its output can be used to assign valid probabilities to observations. We show that this new model outperforms other multivariate binary distribution estimators on several datasets and performs similarly to a large (but intractable) RBM.

CiteSeerX

Edinburgh Research Explorer

Cardinality Restricted Boltzmann Machines

Author: Adams Ryan Prescott
Salakhutdinov Ruslan
Sutskever Ilya
Swersky Kevin
Tarlow Daniel
Zemel Richard
Publication venue: Massachusetts Institute of Technology Press
Publication date: 13/11/2013
Field of study

The Restricted Boltzmann Machine (RBM) is a popular density model that is also good for extracting features. A main source of tractability in RBM models is that, given an input, the posterior distribution over hidden variables is factorizable and can be easily computed and sampled from. Sparsity and competition in the hidden representation is beneficial, and while an RBM with competition among its hidden units would acquire some of the attractive properties of sparse coding, such constraints are typically not added, as the resulting posterior over the hidden units seemingly becomes intractable. In this paper we show that a dynamic programming algorithm can be used to implement exact sparsity in the RBM’s hidden units. We also show how to pass derivatives through the resulting posterior marginals, which makes it possible to fine-tune a pre-trained neural network with sparse hidden layers.Engineering and Applied Science

CiteSeerX

Harvard University - DASH

On the Challenges of Physical Implementations of RBMs

Author: Bengio Yoshua
Courville Aaron
Dumoulin Vincent
Goodfellow Ian J.
Publication venue
Publication date: 21/06/2014
Field of study

Restricted Boltzmann machines (RBMs) are powerful machine learning models, but learning and some kinds of inference in the model require sampling-based approximations, which, in classical digital computers, are implemented using expensive MCMC. Physical computation offers the opportunity to reduce the cost of sampling by building physical systems whose natural dynamics correspond to drawing samples from the desired RBM distribution. Such a system avoids the burn-in and mixing cost of a Markov chain. However, hardware implementations of this variety usually entail limitations such as low-precision and limited range of the parameters and restrictions on the size and topology of the RBM. We conduct software simulations to determine how harmful each of these restrictions is. Our simulations are designed to reproduce aspects of the D-Wave quantum computer, but the issues we investigate arise in most forms of physical computation

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Reconstruction Error and Principal Component Based Anomaly Detection in Hyperspectral imagery

Author: Jablonski James A.
Publication venue: AFIT Scholar
Publication date: 14/03/2014
Field of study

The rapid expansion of remote sensing and information collection capabilities demands methods to highlight interesting or anomalous patterns within an overabundance of data. This research addresses this issue for hyperspectral imagery (HSI). Two new reconstruction based HSI anomaly detectors are outlined: one using principal component analysis (PCA), and the other a form of non-linear PCA called logistic principal component analysis. Two very effective, yet relatively simple, modifications to the autonomous global anomaly detector are also presented, improving algorithm performance and enabling receiver operating characteristic analysis. A novel technique for HSI anomaly detection dubbed multiple PCA is introduced and found to perform as well or better than existing detectors on HYDICE data while using only linear deterministic methods. Finally, a response surface based optimization is performed on algorithm parameters such as to affect consistent desired algorithm performance

AFTI Scholar (Air Force Institute of Technology)

Some Bayesian and multivariate analysis methods in statistical machine learning and applications

Author: Zhou Wen
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2014
Field of study

In this dissertation, we consider some Bayesian and multivariate analysis methods in statistical machine learning as well as some applications of Bayesian methodology with differential equation models to study dynamics during co-infections by Leishmania major and Leishmania amazonensis based on longitudinal data. First, we developed a new MCMC algorithm to integrate the curvature information of a target distribution to sample the target distribution accurately and efficiently. We then introduced a Bayesian Hierarchical Topographic Clustering method (BHTC) motivated by the well-known self-organizing map (SOM) using stationary isotropic Gaussian processes and principal component approximations. We constructed a computationally tractable MCMC algorithm to sample posterior distributions of the covariance matrices, as well as the posterior distributions of remaining BHTC parameters. To summarize the posterior distributions of BHTC parameters in a coherent fashion for the purpose of data clustering, we adopted a posterior risk framework that accounts for both data partitioning and topographic preservation. We also proposed a classification method based on the weighted bootstrap and ensemble mechanism to deal with covariate shifts in classifications, the Active Set Selections based Classification (ASSC). This procedure is flexible to be combined with classification methods including support vector machine (SVM), classification trees, and Fisher\u27s discriminant classifier (LDA) etc. to improve their performances. We adopted Bayesian methodologies to study longitudinal data from co-infections by Leishmania major and Leishmania amazonensis. In the proposed Bayesian analysis, we modeled the immunobiological dynamics and data variations by Lotka-Volterra equations and the linear mixed model, respectively. Using the posterior distributions of differential equation parameters and the concept of asymptotic stable equilibrium of differential equations, we successfully quantified the immune efficiency

Digital Repository @ Iowa State University (ISU)