6,715 research outputs found
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
For most deep learning practitioners, sequence modeling is synonymous with
recurrent networks. Yet recent results indicate that convolutional
architectures can outperform recurrent networks on tasks such as audio
synthesis and machine translation. Given a new sequence modeling task or
dataset, which architecture should one use? We conduct a systematic evaluation
of generic convolutional and recurrent architectures for sequence modeling. The
models are evaluated across a broad range of standard tasks that are commonly
used to benchmark recurrent networks. Our results indicate that a simple
convolutional architecture outperforms canonical recurrent networks such as
LSTMs across a diverse range of tasks and datasets, while demonstrating longer
effective memory. We conclude that the common association between sequence
modeling and recurrent networks should be reconsidered, and convolutional
networks should be regarded as a natural starting point for sequence modeling
tasks. To assist related work, we have made code available at
http://github.com/locuslab/TCN
Visual Attention Model for Cross-sectional Stock Return Prediction and End-to-End Multimodal Market Representation Learning
Technical and fundamental analysis are traditional tools used to analyze
individual stocks; however, the finance literature has shown that the price
movement of each individual stock correlates heavily with other stocks,
especially those within the same sector. In this paper we propose a general
purpose market representation that incorporates fundamental and technical
indicators and relationships between individual stocks. We treat the daily
stock market as a "market image" where rows (grouped by market sector)
represent individual stocks and columns represent indicators. We apply a
convolutional neural network over this market image to build market features in
a hierarchical way. We use a recurrent neural network, with an attention
mechanism over the market feature maps, to model temporal dynamics in the
market. We show that our proposed model outperforms strong baselines in both
short-term and long-term stock return prediction tasks. We also show another
use for our market image: to construct concise and dense market embeddings
suitable for downstream prediction tasks.Comment: Accepted as full paper in the 32nd International FLAIRS Conferenc
Learning distant cause and effect using only local and immediate credit assignment
We present a recurrent neural network memory that uses sparse coding to
create a combinatoric encoding of sequential inputs. Using several examples, we
show that the network can associate distant causes and effects in a discrete
stochastic process, predict partially-observable higher-order sequences, and
enable a DQN agent to navigate a maze by giving it memory. The network uses
only biologically-plausible, local and immediate credit assignment. Memory
requirements are typically one order of magnitude less than existing LSTM, GRU
and autoregressive feed-forward sequence learning models. The most significant
limitation of the memory is generalization to unseen input sequences. We
explore this limitation by measuring next-word prediction perplexity on the
Penn Treebank dataset.Comment: 11 pages, 5 figures, 2 table
Time Perception Machine: Temporal Point Processes for the When, Where and What of Activity Prediction
Numerous powerful point process models have been developed to understand
temporal patterns in sequential data from fields such as health-care,
electronic commerce, social networks, and natural disaster forecasting. In this
paper, we develop novel models for learning the temporal distribution of human
activities in streaming data (e.g., videos and person trajectories). We propose
an integrated framework of neural networks and temporal point processes for
predicting when the next activity will happen. Because point processes are
limited to taking event frames as input, we propose a simple yet effective
mechanism to extract features at frames of interest while also preserving the
rich information in the remaining frames. We evaluate our model on two
challenging datasets. The results show that our model outperforms traditional
statistical point process approaches significantly, demonstrating its
effectiveness in capturing the underlying temporal dynamics as well as the
correlation within sequential activities. Furthermore, we also extend our model
to a joint estimation framework for predicting the timing, spatial location,
and category of the activity simultaneously, to answer the when, where, and
what of activity prediction
Audio-Linguistic Embeddings for Spoken Sentences
We propose spoken sentence embeddings which capture both acoustic and
linguistic content. While existing works operate at the character, phoneme, or
word level, our method learns long-term dependencies by modeling speech at the
sentence level. Formulated as an audio-linguistic multitask learning problem,
our encoder-decoder model simultaneously reconstructs acoustic and natural
language features from audio. Our results show that spoken sentence embeddings
outperform phoneme and word-level baselines on speech recognition and emotion
recognition tasks. Ablation studies show that our embeddings can better model
high-level acoustic concepts while retaining linguistic content. Overall, our
work illustrates the viability of generic, multi-modal sentence embeddings for
spoken language understanding.Comment: International Conference on Acoustics, Speech, and Signal Processing
(ICASSP) 201
Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction
Attention is typically used to select informative sub-phrases that are used
for prediction. This paper investigates the novel use of attention as a form of
feature augmentation, i.e, casted attention. We propose Multi-Cast Attention
Networks (MCAN), a new attention mechanism and general model architecture for a
potpourri of ranking tasks in the conversational modeling and question
answering domains. Our approach performs a series of soft attention operations,
each time casting a scalar feature upon the inner word embeddings. The key idea
is to provide a real-valued hint (feature) to a subsequent encoder layer and is
targeted at improving the representation learning process. There are several
advantages to this design, e.g., it allows an arbitrary number of attention
mechanisms to be casted, allowing for multiple attention types (e.g.,
co-attention, intra-attention) and attention variants (e.g., alignment-pooling,
max-pooling, mean-pooling) to be executed simultaneously. This not only
eliminates the costly need to tune the nature of the co-attention layer, but
also provides greater extents of explainability to practitioners. Via extensive
experiments on four well-known benchmark datasets, we show that MCAN achieves
state-of-the-art performance. On the Ubuntu Dialogue Corpus, MCAN outperforms
existing state-of-the-art models by . MCAN also achieves the best
performing score to date on the well-studied TrecQA dataset.Comment: Accepted to KDD 2018 (Paper titled only "Multi-Cast Attention
Networks" in KDD version
GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations
Modern deep transfer learning approaches have mainly focused on learning
generic feature vectors from one task that are transferable to other tasks,
such as word embeddings in language and pretrained convolutional features in
vision. However, these approaches usually transfer unary features and largely
ignore more structured graphical representations. This work explores the
possibility of learning generic latent relational graphs that capture
dependencies between pairs of data units (e.g., words or pixels) from
large-scale unlabeled data and transferring the graphs to downstream tasks. Our
proposed transfer learning framework improves performance on various tasks
including question answering, natural language inference, sentiment analysis,
and image classification. We also show that the learned graphs are generic
enough to be transferred to different embeddings on which the graphs have not
been trained (including GloVe embeddings, ELMo embeddings, and task-specific
RNN hidden unit), or embedding-free units such as image pixels
Inter-Patient ECG Classification with Convolutional and Recurrent Neural Networks
The recent advances in ECG sensor devices provide opportunities for user
self-managed auto-diagnosis and monitoring services over the internet. This
imposes the requirements for generic ECG classification methods that are
inter-patient and device independent. In this paper, we present our work on
using the densely connected convolutional neural network (DenseNet) and gated
recurrent unit network (GRU) for addressing the inter-patient ECG
classification problem. A deep learning model architecture is proposed and is
evaluated using the MIT-BIH Arrhythmia and Supraventricular Databases. The
results obtained show that without applying any complicated data pre-processing
or feature engineering methods, both of our models have considerably
outperformed the state-of-the-art performance for supraventricular (SVEB) and
ventricular (VEB) arrhythmia classifications on the unseen testing dataset
(with the F1 score improved from 51.08 to 61.25 for SVEB detection and from
88.59 to 89.75 for VEB detection respectively). As no patient-specific or
device-specific information is used at the training stage in this work, it can
be considered as a more generic approach for dealing with scenarios in which
varieties of ECG signals are collected from different patients using different
types of sensor devices.Comment: 10 pages, 8 figure
Generalization Studies of Neural Network Models for Cardiac Disease Detection Using Limited Channel ECG
Acceleration of machine learning research in healthcare is challenged by lack
of large annotated and balanced datasets. Furthermore, dealing with measurement
inaccuracies and exploiting unsupervised data are considered to be central to
improving existing solutions. In particular, a primary objective in predictive
modeling is to generalize well to both unseen variations within the observed
classes, and unseen classes. In this work, we consider such a challenging
problem in machine learning driven diagnosis: detecting a gamut of
cardiovascular conditions (e.g. infarction, dysrhythmia etc.) from limited
channel ECG measurements. Though deep neural networks have achieved
unprecedented success in predictive modeling, they rely solely on
discriminative models that can generalize poorly to unseen classes. We argue
that unsupervised learning can be utilized to construct effective latent spaces
that facilitate better generalization. This work extensively compares the
generalization of our proposed approach against a state-of-the-art deep
learning solution. Our results show significant improvements in F1-scores.Comment: IEEE Computing in Cardiology (CinC) 201
Encoding Source Language with Convolutional Neural Network for Machine Translation
The recently proposed neural network joint model (NNJM) (Devlin et al., 2014)
augments the n-gram target language model with a heuristically chosen source
context window, achieving state-of-the-art performance in SMT. In this paper,
we give a more systematic treatment by summarizing the relevant source
information through a convolutional architecture guided by the target
information. With different guiding signals during decoding, our specifically
designed convolution+gating architectures can pinpoint the parts of a source
sentence that are relevant to predicting a target word, and fuse them with the
context of entire source sentence to form a unified representation. This
representation, together with target language words, are fed to a deep neural
network (DNN) to form a stronger NNJM. Experiments on two NIST Chinese-English
translation tasks show that the proposed model can achieve significant
improvements over the previous NNJM by up to +1.08 BLEU points on averageComment: Accepted as a full paper at ACL 201
- …