13,036 research outputs found
Large Margin Low Rank Tensor Analysis
Other than vector representations, the direct objects of human cognition are
generally high-order tensors, such as 2D images and 3D textures. From this
fact, two interesting questions naturally arise: How does the human brain
represent these tensor perceptions in a "manifold" way, and how can they be
recognized on the "manifold"? In this paper, we present a supervised model to
learn the intrinsic structure of the tensors embedded in a high dimensional
Euclidean space. With the fixed point continuation procedures, our model
automatically and jointly discovers the optimal dimensionality and the
representations of the low dimensional embeddings. This makes it an effective
simulation of the cognitive process of human brain. Furthermore, the
generalization of our model based on similarity between the learned low
dimensional embeddings can be viewed as counterpart of recognition of human
brain. Experiments on applications for object recognition and face recognition
demonstrate the superiority of our proposed model over state-of-the-art
approaches.Comment: 30 page
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities
The Softmax function on top of a final linear layer is the de facto method to
output probability distributions in neural networks. In many applications such
as language models or text generation, this model has to produce distributions
over large output vocabularies. Recently, this has been shown to have limited
representational capacity due to its connection with the rank bottleneck in
matrix factorization. However, little is known about the limitations of
Linear-Softmax for quantities of practical interest such as cross entropy or
mode estimation, a direction that we explore here. As an efficient and
effective solution to alleviate this issue, we propose to learn parametric
monotonic functions on top of the logits. We theoretically investigate the rank
increasing capabilities of such monotonic functions. Empirically, our method
improves in two different quality metrics over the traditional Linear-Softmax
layer in synthetic and real language model experiments, adding little time or
memory overhead, while being comparable to the more computationally expensive
mixture of Softmaxes
Quantum-assisted associative adversarial network: Applying quantum annealing in deep learning
We present an algorithm for learning a latent variable generative model via
generative adversarial learning where the canonical uniform noise input is
replaced by samples from a graphical model. This graphical model is learned by
a Boltzmann machine which learns low-dimensional feature representation of data
extracted by the discriminator. A quantum annealer, the D-Wave 2000Q, is used
to sample from this model. This algorithm joins a growing family of algorithms
that use a quantum annealing subroutine in deep learning, and provides a
framework to test the advantages of quantum-assisted learning in GANs. Fully
connected, symmetric bipartite and Chimera graph topologies are compared on a
reduced stochastically binarized MNIST dataset, for both classical and quantum
annealing sampling methods. The quantum-assisted associative adversarial
network successfully learns a generative model of the MNIST dataset for all
topologies, and is also applied to the LSUN dataset bedrooms class for the
Chimera topology. Evaluated using the Fr\'{e}chet inception distance and
inception score, the quantum and classical versions of the algorithm are found
to have equivalent performance for learning an implicit generative model of the
MNIST dataset
Consensus Attention-based Neural Networks for Chinese Reading Comprehension
Reading comprehension has embraced a booming in recent NLP research. Several
institutes have released the Cloze-style reading comprehension data, and these
have greatly accelerated the research of machine comprehension. In this work,
we firstly present Chinese reading comprehension datasets, which consist of
People Daily news dataset and Children's Fairy Tale (CFT) dataset. Also, we
propose a consensus attention-based neural network architecture to tackle the
Cloze-style reading comprehension problem, which aims to induce a consensus
attention over every words in the query. Experimental results show that the
proposed neural network significantly outperforms the state-of-the-art
baselines in several public datasets. Furthermore, we setup a baseline for
Chinese reading comprehension task, and hopefully this would speed up the
process for future research.Comment: 9+1 pages, published at COLING 201
Deep Learning for Sequential Recommendation: Algorithms, Influential Factors, and Evaluations
In the field of sequential recommendation, deep learning (DL)-based methods
have received a lot of attention in the past few years and surpassed
traditional models such as Markov chain-based and factorization-based ones.
However, there is little systematic study on DL-based methods, especially
regarding to how to design an effective DL model for sequential recommendation.
In this view, this survey focuses on DL-based sequential recommender systems by
taking the aforementioned issues into consideration. Specifically,we illustrate
the concept of sequential recommendation, propose a categorization of existing
algorithms in terms of three types of behavioral sequence, summarize the key
factors affecting the performance of DL-based models, and conduct corresponding
evaluations to demonstrate the effects of these factors. We conclude this
survey by systematically outlining future directions and challenges in this
field.Comment: 36 pages, 17 figures, 6 tables, 104 reference
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce
high-quality speech directly from text or simple linguistic features such as
phonemes. Unlike traditional pipeline TTS, the neural sequence-to-sequence TTS
does not require manually annotated and complicated linguistic features such as
part-of-speech tags and syntactic structures for system training. However, it
must be carefully designed and well optimized so that it can implicitly extract
useful linguistic features from the input features. In this paper we
investigate under what conditions the neural sequence-to-sequence TTS can work
well in Japanese and English along with comparisons with deep neural network
(DNN) based pipeline TTS systems. Unlike past comparative studies, the pipeline
systems also use autoregressive probabilistic modeling and a neural vocoder. We
investigated systems from three aspects: a) model architecture, b) model
parameter size, and c) language. For the model architecture aspect, we adopt
modified Tacotron systems that we previously proposed and their variants using
an encoder from Tacotron or Tacotron2. For the model parameter size aspect, we
investigate two model parameter sizes. For the language aspect, we conduct
listening tests in both Japanese and English to see if our findings can be
generalized across languages. Our experiments suggest that a) a neural
sequence-to-sequence TTS system should have a sufficient number of model
parameters to produce high quality speech, b) it should also use a powerful
encoder when it takes characters as inputs, and c) the encoder still has a room
for improvement and needs to have an improved architecture to learn
supra-segmental features more appropriately
DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning
We study the problem of learning to reason in large scale knowledge graphs
(KGs). More specifically, we describe a novel reinforcement learning framework
for learning multi-hop relational paths: we use a policy-based agent with
continuous states based on knowledge graph embeddings, which reasons in a KG
vector space by sampling the most promising relation to extend its path. In
contrast to prior work, our approach includes a reward function that takes the
accuracy, diversity, and efficiency into consideration. Experimentally, we show
that our proposed method outperforms a path-ranking based algorithm and
knowledge graph embedding methods on Freebase and Never-Ending Language
Learning datasets.Comment: EMNLP 1
Auditory Separation of a Conversation from Background via Attentional Gating
We present a model for separating a set of voices out of a sound mixture
containing an unknown number of sources. Our Attentional Gating Network (AGN)
uses a variable attentional context to specify which speakers in the mixture
are of interest. The attentional context is specified by an embedding vector
which modifies the processing of a neural network through an additive bias.
Individual speaker embeddings are learned to separate a single speaker while
superpositions of the individual speaker embeddings are used to separate sets
of speakers. We first evaluate AGN on a traditional single speaker separation
task and show an improvement of 9% with respect to comparable models. Then, we
introduce a new task to separate an arbitrary subset of voices from a mixture
of an unknown-sized set of voices, inspired by the human ability to separate a
conversation of interest from background chatter at a cafeteria. We show that
AGN is the only model capable of solving this task, performing only 7% worse
than on the single speaker separation task
Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution
Convolutional neural networks (CNNs) have recently achieved great success in
single-image super-resolution (SISR). However, these methods tend to produce
over-smoothed outputs and miss some textural details. To solve these problems,
we propose the Super-Resolution CliqueNet (SRCliqueNet) to reconstruct the high
resolution (HR) image with better textural details in the wavelet domain. The
proposed SRCliqueNet firstly extracts a set of feature maps from the low
resolution (LR) image by the clique blocks group. Then we send the set of
feature maps to the clique up-sampling module to reconstruct the HR image. The
clique up-sampling module consists of four sub-nets which predict the high
resolution wavelet coefficients of four sub-bands. Since we consider the edge
feature properties of four sub-bands, the four sub-nets are connected to the
others so that they can learn the coefficients of four sub-bands jointly.
Finally we apply inverse discrete wavelet transform (IDWT) to the output of
four sub-nets at the end of the clique up-sampling module to increase the
resolution and reconstruct the HR image. Extensive quantitative and qualitative
experiments on benchmark datasets show that our method achieves superior
performance over the state-of-the-art methods.Comment: Accepted in NIPS 201
Phonetic-enriched Text Representation for Chinese Sentiment Analysis with Reinforcement Learning
The Chinese pronunciation system offers two characteristics that distinguish
it from other languages: deep phonemic orthography and intonation variations.
We are the first to argue that these two important properties can play a major
role in Chinese sentiment analysis. Particularly, we propose two effective
features to encode phonetic information. Next, we develop a Disambiguate
Intonation for Sentiment Analysis (DISA) network using a reinforcement network.
It functions as disambiguating intonations for each Chinese character (pinyin).
Thus, a precise phonetic representation of Chinese is learned. Furthermore, we
also fuse phonetic features with textual and visual features in order to mimic
the way humans read and understand Chinese text. Experimental results on five
different Chinese sentiment analysis datasets show that the inclusion of
phonetic features significantly and consistently improves the performance of
textual and visual representations and outshines the state-of-the-art Chinese
character level representations
- …