13,036 research outputs found

    Large Margin Low Rank Tensor Analysis

    Full text link
    Other than vector representations, the direct objects of human cognition are generally high-order tensors, such as 2D images and 3D textures. From this fact, two interesting questions naturally arise: How does the human brain represent these tensor perceptions in a "manifold" way, and how can they be recognized on the "manifold"? In this paper, we present a supervised model to learn the intrinsic structure of the tensors embedded in a high dimensional Euclidean space. With the fixed point continuation procedures, our model automatically and jointly discovers the optimal dimensionality and the representations of the low dimensional embeddings. This makes it an effective simulation of the cognitive process of human brain. Furthermore, the generalization of our model based on similarity between the learned low dimensional embeddings can be viewed as counterpart of recognition of human brain. Experiments on applications for object recognition and face recognition demonstrate the superiority of our proposed model over state-of-the-art approaches.Comment: 30 page

    Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

    Full text link
    The Softmax function on top of a final linear layer is the de facto method to output probability distributions in neural networks. In many applications such as language models or text generation, this model has to produce distributions over large output vocabularies. Recently, this has been shown to have limited representational capacity due to its connection with the rank bottleneck in matrix factorization. However, little is known about the limitations of Linear-Softmax for quantities of practical interest such as cross entropy or mode estimation, a direction that we explore here. As an efficient and effective solution to alleviate this issue, we propose to learn parametric monotonic functions on top of the logits. We theoretically investigate the rank increasing capabilities of such monotonic functions. Empirically, our method improves in two different quality metrics over the traditional Linear-Softmax layer in synthetic and real language model experiments, adding little time or memory overhead, while being comparable to the more computationally expensive mixture of Softmaxes

    Quantum-assisted associative adversarial network: Applying quantum annealing in deep learning

    Full text link
    We present an algorithm for learning a latent variable generative model via generative adversarial learning where the canonical uniform noise input is replaced by samples from a graphical model. This graphical model is learned by a Boltzmann machine which learns low-dimensional feature representation of data extracted by the discriminator. A quantum annealer, the D-Wave 2000Q, is used to sample from this model. This algorithm joins a growing family of algorithms that use a quantum annealing subroutine in deep learning, and provides a framework to test the advantages of quantum-assisted learning in GANs. Fully connected, symmetric bipartite and Chimera graph topologies are compared on a reduced stochastically binarized MNIST dataset, for both classical and quantum annealing sampling methods. The quantum-assisted associative adversarial network successfully learns a generative model of the MNIST dataset for all topologies, and is also applied to the LSUN dataset bedrooms class for the Chimera topology. Evaluated using the Fr\'{e}chet inception distance and inception score, the quantum and classical versions of the algorithm are found to have equivalent performance for learning an implicit generative model of the MNIST dataset

    Consensus Attention-based Neural Networks for Chinese Reading Comprehension

    Full text link
    Reading comprehension has embraced a booming in recent NLP research. Several institutes have released the Cloze-style reading comprehension data, and these have greatly accelerated the research of machine comprehension. In this work, we firstly present Chinese reading comprehension datasets, which consist of People Daily news dataset and Children's Fairy Tale (CFT) dataset. Also, we propose a consensus attention-based neural network architecture to tackle the Cloze-style reading comprehension problem, which aims to induce a consensus attention over every words in the query. Experimental results show that the proposed neural network significantly outperforms the state-of-the-art baselines in several public datasets. Furthermore, we setup a baseline for Chinese reading comprehension task, and hopefully this would speed up the process for future research.Comment: 9+1 pages, published at COLING 201

    Deep Learning for Sequential Recommendation: Algorithms, Influential Factors, and Evaluations

    Full text link
    In the field of sequential recommendation, deep learning (DL)-based methods have received a lot of attention in the past few years and surpassed traditional models such as Markov chain-based and factorization-based ones. However, there is little systematic study on DL-based methods, especially regarding to how to design an effective DL model for sequential recommendation. In this view, this survey focuses on DL-based sequential recommender systems by taking the aforementioned issues into consideration. Specifically,we illustrate the concept of sequential recommendation, propose a categorization of existing algorithms in terms of three types of behavioral sequence, summarize the key factors affecting the performance of DL-based models, and conduct corresponding evaluations to demonstrate the effects of these factors. We conclude this survey by systematically outlining future directions and challenges in this field.Comment: 36 pages, 17 figures, 6 tables, 104 reference

    Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

    Full text link
    Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce high-quality speech directly from text or simple linguistic features such as phonemes. Unlike traditional pipeline TTS, the neural sequence-to-sequence TTS does not require manually annotated and complicated linguistic features such as part-of-speech tags and syntactic structures for system training. However, it must be carefully designed and well optimized so that it can implicitly extract useful linguistic features from the input features. In this paper we investigate under what conditions the neural sequence-to-sequence TTS can work well in Japanese and English along with comparisons with deep neural network (DNN) based pipeline TTS systems. Unlike past comparative studies, the pipeline systems also use autoregressive probabilistic modeling and a neural vocoder. We investigated systems from three aspects: a) model architecture, b) model parameter size, and c) language. For the model architecture aspect, we adopt modified Tacotron systems that we previously proposed and their variants using an encoder from Tacotron or Tacotron2. For the model parameter size aspect, we investigate two model parameter sizes. For the language aspect, we conduct listening tests in both Japanese and English to see if our findings can be generalized across languages. Our experiments suggest that a) a neural sequence-to-sequence TTS system should have a sufficient number of model parameters to produce high quality speech, b) it should also use a powerful encoder when it takes characters as inputs, and c) the encoder still has a room for improvement and needs to have an improved architecture to learn supra-segmental features more appropriately

    DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

    Full text link
    We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.Comment: EMNLP 1

    Auditory Separation of a Conversation from Background via Attentional Gating

    Full text link
    We present a model for separating a set of voices out of a sound mixture containing an unknown number of sources. Our Attentional Gating Network (AGN) uses a variable attentional context to specify which speakers in the mixture are of interest. The attentional context is specified by an embedding vector which modifies the processing of a neural network through an additive bias. Individual speaker embeddings are learned to separate a single speaker while superpositions of the individual speaker embeddings are used to separate sets of speakers. We first evaluate AGN on a traditional single speaker separation task and show an improvement of 9% with respect to comparable models. Then, we introduce a new task to separate an arbitrary subset of voices from a mixture of an unknown-sized set of voices, inspired by the human ability to separate a conversation of interest from background chatter at a cafeteria. We show that AGN is the only model capable of solving this task, performing only 7% worse than on the single speaker separation task

    Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution

    Full text link
    Convolutional neural networks (CNNs) have recently achieved great success in single-image super-resolution (SISR). However, these methods tend to produce over-smoothed outputs and miss some textural details. To solve these problems, we propose the Super-Resolution CliqueNet (SRCliqueNet) to reconstruct the high resolution (HR) image with better textural details in the wavelet domain. The proposed SRCliqueNet firstly extracts a set of feature maps from the low resolution (LR) image by the clique blocks group. Then we send the set of feature maps to the clique up-sampling module to reconstruct the HR image. The clique up-sampling module consists of four sub-nets which predict the high resolution wavelet coefficients of four sub-bands. Since we consider the edge feature properties of four sub-bands, the four sub-nets are connected to the others so that they can learn the coefficients of four sub-bands jointly. Finally we apply inverse discrete wavelet transform (IDWT) to the output of four sub-nets at the end of the clique up-sampling module to increase the resolution and reconstruct the HR image. Extensive quantitative and qualitative experiments on benchmark datasets show that our method achieves superior performance over the state-of-the-art methods.Comment: Accepted in NIPS 201

    Phonetic-enriched Text Representation for Chinese Sentiment Analysis with Reinforcement Learning

    Full text link
    The Chinese pronunciation system offers two characteristics that distinguish it from other languages: deep phonemic orthography and intonation variations. We are the first to argue that these two important properties can play a major role in Chinese sentiment analysis. Particularly, we propose two effective features to encode phonetic information. Next, we develop a Disambiguate Intonation for Sentiment Analysis (DISA) network using a reinforcement network. It functions as disambiguating intonations for each Chinese character (pinyin). Thus, a precise phonetic representation of Chinese is learned. Furthermore, we also fuse phonetic features with textual and visual features in order to mimic the way humans read and understand Chinese text. Experimental results on five different Chinese sentiment analysis datasets show that the inclusion of phonetic features significantly and consistently improves the performance of textual and visual representations and outshines the state-of-the-art Chinese character level representations
    corecore