373 research outputs found

    Conditional Random Field Autoencoders for Unsupervised Structured Prediction

    Full text link
    We introduce a framework for unsupervised learning of structured predictors with overlapping, global features. Each input's latent representation is predicted conditional on the observable data using a feature-rich conditional random field. Then a reconstruction of the input is (re)generated, conditional on the latent structure, using models for which maximum likelihood estimation has a closed-form. Our autoencoder formulation enables efficient learning without making unrealistic independence assumptions or restricting the kinds of features that can be used. We illustrate insightful connections to traditional autoencoders, posterior regularization and multi-view learning. We show competitive results with instantiations of the model for two canonical NLP tasks: part-of-speech induction and bitext word alignment, and show that training our model can be substantially more efficient than comparable feature-rich baselines

    Supervised Attentions for Neural Machine Translation

    Full text link
    In this paper, we improve the attention or alignment accuracy of neural machine translation by utilizing the alignments of training sentence pairs. We simply compute the distance between the machine attentions and the "true" alignments, and minimize this cost in the training procedure. Our experiments on large-scale Chinese-to-English task show that our model improves both translation and alignment qualities significantly over the large-vocabulary neural machine translation system, and even beats a state-of-the-art traditional syntax-based system.Comment: 6 pages. In Proceedings of EMNLP 2016. arXiv admin note: text overlap with arXiv:1605.0314

    A Computationally Efficient Projection-Based Approach for Spatial Generalized Linear Mixed Models

    Full text link
    Inference for spatial generalized linear mixed models (SGLMMs) for high-dimensional non-Gaussian spatial data is computationally intensive. The computational challenge is due to the high-dimensional random effects and because Markov chain Monte Carlo (MCMC) algorithms for these models tend to be slow mixing. Moreover, spatial confounding inflates the variance of fixed effect (regression coefficient) estimates. Our approach addresses both the computational and confounding issues by replacing the high-dimensional spatial random effects with a reduced-dimensional representation based on random projections. Standard MCMC algorithms mix well and the reduced-dimensional setting speeds up computations per iteration. We show, via simulated examples, that Bayesian inference for this reduced-dimensional approach works well both in terms of inference as well as prediction, our methods also compare favorably to existing "reduced-rank" approaches. We also apply our methods to two real world data examples, one on bird count data and the other classifying rock types

    Variational Deep Semantic Hashing for Text Documents

    Full text link
    As the amount of textual data has been rapidly increasing over the past decade, efficient similarity search methods have become a crucial component of large-scale information retrieval systems. A popular strategy is to represent original data samples by compact binary codes through hashing. A spectrum of machine learning methods have been utilized, but they often lack expressiveness and flexibility in modeling to learn effective representations. The recent advances of deep learning in a wide range of applications has demonstrated its capability to learn robust and powerful feature representations for complex data. Especially, deep generative models naturally combine the expressiveness of probabilistic generative models with the high capacity of deep neural networks, which is very suitable for text modeling. However, little work has leveraged the recent progress in deep learning for text hashing. In this paper, we propose a series of novel deep document generative models for text hashing. The first proposed model is unsupervised while the second one is supervised by utilizing document labels/tags for hashing. The third model further considers document-specific factors that affect the generation of words. The probabilistic generative formulation of the proposed models provides a principled framework for model extension, uncertainty estimation, simulation, and interpretability. Based on variational inference and reparameterization, the proposed models can be interpreted as encoder-decoder deep neural networks and thus they are capable of learning complex nonlinear distributed representations of the original documents. We conduct a comprehensive set of experiments on four public testbeds. The experimental results have demonstrated the effectiveness of the proposed supervised learning models for text hashing.Comment: 11 pages, 4 figure
    • …
    corecore