6,715 research outputs found

    An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

    Full text link
    For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks. To assist related work, we have made code available at http://github.com/locuslab/TCN

    Visual Attention Model for Cross-sectional Stock Return Prediction and End-to-End Multimodal Market Representation Learning

    Full text link
    Technical and fundamental analysis are traditional tools used to analyze individual stocks; however, the finance literature has shown that the price movement of each individual stock correlates heavily with other stocks, especially those within the same sector. In this paper we propose a general purpose market representation that incorporates fundamental and technical indicators and relationships between individual stocks. We treat the daily stock market as a "market image" where rows (grouped by market sector) represent individual stocks and columns represent indicators. We apply a convolutional neural network over this market image to build market features in a hierarchical way. We use a recurrent neural network, with an attention mechanism over the market feature maps, to model temporal dynamics in the market. We show that our proposed model outperforms strong baselines in both short-term and long-term stock return prediction tasks. We also show another use for our market image: to construct concise and dense market embeddings suitable for downstream prediction tasks.Comment: Accepted as full paper in the 32nd International FLAIRS Conferenc

    Learning distant cause and effect using only local and immediate credit assignment

    Full text link
    We present a recurrent neural network memory that uses sparse coding to create a combinatoric encoding of sequential inputs. Using several examples, we show that the network can associate distant causes and effects in a discrete stochastic process, predict partially-observable higher-order sequences, and enable a DQN agent to navigate a maze by giving it memory. The network uses only biologically-plausible, local and immediate credit assignment. Memory requirements are typically one order of magnitude less than existing LSTM, GRU and autoregressive feed-forward sequence learning models. The most significant limitation of the memory is generalization to unseen input sequences. We explore this limitation by measuring next-word prediction perplexity on the Penn Treebank dataset.Comment: 11 pages, 5 figures, 2 table

    Time Perception Machine: Temporal Point Processes for the When, Where and What of Activity Prediction

    Full text link
    Numerous powerful point process models have been developed to understand temporal patterns in sequential data from fields such as health-care, electronic commerce, social networks, and natural disaster forecasting. In this paper, we develop novel models for learning the temporal distribution of human activities in streaming data (e.g., videos and person trajectories). We propose an integrated framework of neural networks and temporal point processes for predicting when the next activity will happen. Because point processes are limited to taking event frames as input, we propose a simple yet effective mechanism to extract features at frames of interest while also preserving the rich information in the remaining frames. We evaluate our model on two challenging datasets. The results show that our model outperforms traditional statistical point process approaches significantly, demonstrating its effectiveness in capturing the underlying temporal dynamics as well as the correlation within sequential activities. Furthermore, we also extend our model to a joint estimation framework for predicting the timing, spatial location, and category of the activity simultaneously, to answer the when, where, and what of activity prediction

    Audio-Linguistic Embeddings for Spoken Sentences

    Full text link
    We propose spoken sentence embeddings which capture both acoustic and linguistic content. While existing works operate at the character, phoneme, or word level, our method learns long-term dependencies by modeling speech at the sentence level. Formulated as an audio-linguistic multitask learning problem, our encoder-decoder model simultaneously reconstructs acoustic and natural language features from audio. Our results show that spoken sentence embeddings outperform phoneme and word-level baselines on speech recognition and emotion recognition tasks. Ablation studies show that our embeddings can better model high-level acoustic concepts while retaining linguistic content. Overall, our work illustrates the viability of generic, multi-modal sentence embeddings for spoken language understanding.Comment: International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 201

    Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction

    Full text link
    Attention is typically used to select informative sub-phrases that are used for prediction. This paper investigates the novel use of attention as a form of feature augmentation, i.e, casted attention. We propose Multi-Cast Attention Networks (MCAN), a new attention mechanism and general model architecture for a potpourri of ranking tasks in the conversational modeling and question answering domains. Our approach performs a series of soft attention operations, each time casting a scalar feature upon the inner word embeddings. The key idea is to provide a real-valued hint (feature) to a subsequent encoder layer and is targeted at improving the representation learning process. There are several advantages to this design, e.g., it allows an arbitrary number of attention mechanisms to be casted, allowing for multiple attention types (e.g., co-attention, intra-attention) and attention variants (e.g., alignment-pooling, max-pooling, mean-pooling) to be executed simultaneously. This not only eliminates the costly need to tune the nature of the co-attention layer, but also provides greater extents of explainability to practitioners. Via extensive experiments on four well-known benchmark datasets, we show that MCAN achieves state-of-the-art performance. On the Ubuntu Dialogue Corpus, MCAN outperforms existing state-of-the-art models by 9%9\%. MCAN also achieves the best performing score to date on the well-studied TrecQA dataset.Comment: Accepted to KDD 2018 (Paper titled only "Multi-Cast Attention Networks" in KDD version

    GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations

    Full text link
    Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision. However, these approaches usually transfer unary features and largely ignore more structured graphical representations. This work explores the possibility of learning generic latent relational graphs that capture dependencies between pairs of data units (e.g., words or pixels) from large-scale unlabeled data and transferring the graphs to downstream tasks. Our proposed transfer learning framework improves performance on various tasks including question answering, natural language inference, sentiment analysis, and image classification. We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden unit), or embedding-free units such as image pixels

    Inter-Patient ECG Classification with Convolutional and Recurrent Neural Networks

    Full text link
    The recent advances in ECG sensor devices provide opportunities for user self-managed auto-diagnosis and monitoring services over the internet. This imposes the requirements for generic ECG classification methods that are inter-patient and device independent. In this paper, we present our work on using the densely connected convolutional neural network (DenseNet) and gated recurrent unit network (GRU) for addressing the inter-patient ECG classification problem. A deep learning model architecture is proposed and is evaluated using the MIT-BIH Arrhythmia and Supraventricular Databases. The results obtained show that without applying any complicated data pre-processing or feature engineering methods, both of our models have considerably outperformed the state-of-the-art performance for supraventricular (SVEB) and ventricular (VEB) arrhythmia classifications on the unseen testing dataset (with the F1 score improved from 51.08 to 61.25 for SVEB detection and from 88.59 to 89.75 for VEB detection respectively). As no patient-specific or device-specific information is used at the training stage in this work, it can be considered as a more generic approach for dealing with scenarios in which varieties of ECG signals are collected from different patients using different types of sensor devices.Comment: 10 pages, 8 figure

    Generalization Studies of Neural Network Models for Cardiac Disease Detection Using Limited Channel ECG

    Full text link
    Acceleration of machine learning research in healthcare is challenged by lack of large annotated and balanced datasets. Furthermore, dealing with measurement inaccuracies and exploiting unsupervised data are considered to be central to improving existing solutions. In particular, a primary objective in predictive modeling is to generalize well to both unseen variations within the observed classes, and unseen classes. In this work, we consider such a challenging problem in machine learning driven diagnosis: detecting a gamut of cardiovascular conditions (e.g. infarction, dysrhythmia etc.) from limited channel ECG measurements. Though deep neural networks have achieved unprecedented success in predictive modeling, they rely solely on discriminative models that can generalize poorly to unseen classes. We argue that unsupervised learning can be utilized to construct effective latent spaces that facilitate better generalization. This work extensively compares the generalization of our proposed approach against a state-of-the-art deep learning solution. Our results show significant improvements in F1-scores.Comment: IEEE Computing in Cardiology (CinC) 201

    Encoding Source Language with Convolutional Neural Network for Machine Translation

    Full text link
    The recently proposed neural network joint model (NNJM) (Devlin et al., 2014) augments the n-gram target language model with a heuristically chosen source context window, achieving state-of-the-art performance in SMT. In this paper, we give a more systematic treatment by summarizing the relevant source information through a convolutional architecture guided by the target information. With different guiding signals during decoding, our specifically designed convolution+gating architectures can pinpoint the parts of a source sentence that are relevant to predicting a target word, and fuse them with the context of entire source sentence to form a unified representation. This representation, together with target language words, are fed to a deep neural network (DNN) to form a stronger NNJM. Experiments on two NIST Chinese-English translation tasks show that the proposed model can achieve significant improvements over the previous NNJM by up to +1.08 BLEU points on averageComment: Accepted as a full paper at ACL 201
    • …
    corecore