7,159 research outputs found
Ionospheric activity prediction using convolutional recurrent neural networks
The ionosphere electromagnetic activity is a major factor of the quality of
satellite telecommunications, Global Navigation Satellite Systems (GNSS) and
other vital space applications. Being able to forecast globally the Total
Electron Content (TEC) would enable a better anticipation of potential
performance degradations. A few studies have proposed models able to predict
the TEC locally, but not worldwide for most of them. Thanks to a large record
of past TEC maps publicly available, we propose a method based on Deep Neural
Networks (DNN) to forecast a sequence of global TEC maps consecutive to an
input sequence of TEC maps, without introducing any prior knowledge other than
Earth rotation periodicity. By combining several state-of-the-art
architectures, the proposed approach is competitive with previous works on TEC
forecasting while predicting the TEC globally.Comment: Under submission at IEEE Transactions on Big Dat
TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation
Action segmentation as a milestone towards building automatic systems to
understand untrimmed videos has received considerable attention in the recent
years. It is typically being modeled as a sequence labeling problem but
contains intrinsic and sufficient differences than text parsing or speech
processing. In this paper, we introduce a novel hybrid temporal convolutional
and recurrent network (TricorNet), which has an encoder-decoder architecture:
the encoder consists of a hierarchy of temporal convolutional kernels that
capture the local motion changes of different actions; the decoder is a
hierarchy of recurrent neural networks that are able to learn and memorize
long-term action dependencies after the encoding stage. Our model is simple but
extremely effective in terms of video sequence labeling. The experimental
results on three public action segmentation datasets have shown that the
proposed model achieves superior performance over the state of the art
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
We show that an end-to-end deep learning approach can be used to recognize
either English or Mandarin Chinese speech--two vastly different languages.
Because it replaces entire pipelines of hand-engineered components with neural
networks, end-to-end learning allows us to handle a diverse variety of speech
including noisy environments, accents and different languages. Key to our
approach is our application of HPC techniques, resulting in a 7x speedup over
our previous system. Because of this efficiency, experiments that previously
took weeks now run in days. This enables us to iterate more quickly to identify
superior architectures and algorithms. As a result, in several cases, our
system is competitive with the transcription of human workers when benchmarked
on standard datasets. Finally, using a technique called Batch Dispatch with
GPUs in the data center, we show that our system can be inexpensively deployed
in an online setting, delivering low latency when serving users at scale
Bike Flow Prediction with Multi-Graph Convolutional Networks
One fundamental issue in managing bike sharing systems is the bike flow
prediction. Due to the hardness of predicting the flow for a single station,
recent research works often predict the bike flow at cluster-level. While such
studies gain satisfactory prediction accuracy, they cannot directly guide some
fine-grained bike sharing system management issues at station-level. In this
paper, we revisit the problem of the station-level bike flow prediction, aiming
to boost the prediction accuracy leveraging the breakthroughs of deep learning
techniques. We propose a new multi-graph convolutional neural network model to
predict the bike flow at station-level, where the key novelty is viewing the
bike sharing system from the graph perspective. More specifically, we construct
multiple inter-station graphs for a bike sharing system. In each graph, nodes
are stations, and edges are a certain type of relations between stations. Then,
multiple graphs are constructed to reflect heterogeneous relationships (e.g.,
distance, ride record correlation). Afterward, we fuse the multiple graphs and
then apply the convolutional layers on the fused graph to predict station-level
future bike flow. In addition to the estimated bike flow value, our model also
gives the prediction confidence interval so as to help the bike sharing system
managers make decisions. Using New York City and Chicago bike sharing data for
experiments, our model can outperform state-of-the-art station-level prediction
models by reducing 25.1% and 17.0% of prediction error in New York City and
Chicago, respectively
Multi Resolution LSTM For Long Term Prediction In Neural Activity Video
Epileptic seizures are caused by abnormal, overly syn- chronized, electrical
activity in the brain. The abnor- mal electrical activity manifests as waves,
propagating across the brain. Accurate prediction of the propagation velocity
and direction of these waves could enable real- time responsive brain
stimulation to suppress or prevent the seizures entirely. However, this problem
is very chal- lenging because the algorithm must be able to predict the neural
signals in a sufficiently long time horizon to allow enough time for medical
intervention. We consider how to accomplish long term prediction using a LSTM
network. To alleviate the vanishing gradient problem, we propose two
encoder-decoder-predictor structures, both using multi-resolution
representation. The novel LSTM structure with multi-resolution layers could
significantly outperform the single-resolution benchmark with similar number of
parameters. To overcome the blurring effect associated with video prediction in
the pixel domain using standard mean square error (MSE) loss, we use energy-
based adversarial training to improve the long-term pre- diction. We
demonstrate and analyze how a discriminative model with an encoder-decoder
structure using 3D CNN model improves long term prediction
Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Spatiotemporal forecasting has various applications in neuroscience, climate
and transportation domain. Traffic forecasting is one canonical example of such
learning task. The task is challenging due to (1) complex spatial dependency on
road networks, (2) non-linear temporal dynamics with changing road conditions
and (3) inherent difficulty of long-term forecasting. To address these
challenges, we propose to model the traffic flow as a diffusion process on a
directed graph and introduce Diffusion Convolutional Recurrent Neural Network
(DCRNN), a deep learning framework for traffic forecasting that incorporates
both spatial and temporal dependency in the traffic flow. Specifically, DCRNN
captures the spatial dependency using bidirectional random walks on the graph,
and the temporal dependency using the encoder-decoder architecture with
scheduled sampling. We evaluate the framework on two real-world large scale
road network traffic datasets and observe consistent improvement of 12% - 15%
over state-of-the-art baselines.Comment: Published as a conference paper at ICLR 201
Train, Diagnose and Fix: Interpretable Approach for Fine-grained Action Recognition
Despite the growing discriminative capabilities of modern deep learning
methods for recognition tasks, the inner workings of the state-of-art models
still remain mostly black-boxes. In this paper, we propose a systematic
interpretation of model parameters and hidden representations of Residual
Temporal Convolutional Networks (Res-TCN) for action recognition in time-series
data. We also propose a Feature Map Decoder as part of the interpretation
analysis, which outputs a representation of model's hidden variables in the
same domain as the input. Such analysis empowers us to expose model's
characteristic learning patterns in an interpretable way. For example, through
the diagnosis analysis, we discovered that our model has learned to achieve
view-point invariance by implicitly learning to perform rotational
normalization of the input to a more discriminative view. Based on the findings
from the model interpretation analysis, we propose a targeted refinement
technique, which can generalize to various other recognition models. The
proposed work introduces a three-stage paradigm for model learning: training,
interpretable diagnosis and targeted refinement. We validate our approach on
skeleton based 3D human action recognition benchmark of NTU RGB+D. We show that
the proposed workflow is an effective model learning strategy and the resulting
Multi-stream Residual Temporal Convolutional Network (MS-Res-TCN) achieves the
state-of-the-art performance on NTU RGB+D.Comment: 8 pages, 8 figures, CVPR18 submissio
Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
This paper describes a novel text-to-speech (TTS) technique based on deep
convolutional neural networks (CNN), without use of any recurrent units.
Recurrent neural networks (RNN) have become a standard technique to model
sequential data recently, and this technique has been used in some cutting-edge
neural TTS techniques. However, training RNN components often requires a very
powerful computer, or a very long time, typically several days or weeks. Recent
other studies, on the other hand, have shown that CNN-based sequence synthesis
can be much faster than RNN-based techniques, because of high
parallelizability. The objective of this paper is to show that an alternative
neural TTS based only on CNN alleviate these economic costs of training. In our
experiment, the proposed Deep Convolutional TTS was sufficiently trained
overnight (15 hours), using an ordinary gaming PC equipped with two GPUs, while
the quality of the synthesized speech was almost acceptable.Comment: 5 pages, 3figures, IEEE ICASSP 201
Learning to Detect Instantaneous Changes with Retrospective Convolution and Static Sample Synthesis
Change detection has been a challenging visual task due to the dynamic nature
of real-world scenes. Good performance of existing methods depends largely on
prior background images or a long-term observation. These methods, however,
suffer severe degradation when they are applied to detection of instantaneously
occurred changes with only a few preceding frames provided. In this paper, we
exploit spatio-temporal convolutional networks to address this challenge, and
propose a novel retrospective convolution, which features efficient change
information extraction between the current frame and frames from historical
observation. To address the problem of foreground-specific over-fitting in
learning-based methods, we further propose a data augmentation method, named
static sample synthesis, to guide the network to focus on learning change-cued
information rather than specific spatial features of foreground. Trained
end-to-end with complex scenarios, our framework proves to be accurate in
detecting instantaneous changes and robust in combating diverse noises.
Extensive experiments demonstrate that our proposed method significantly
outperforms existing methods.Comment: 10 pages, 9 figure
Efficient B-mode Ultrasound Image Reconstruction from Sub-sampled RF Data using Deep Learning
In portable, three dimensional, and ultra-fast ultrasound imaging systems,
there is an increasing demand for the reconstruction of high quality images
from a limited number of radio-frequency (RF) measurements due to receiver (Rx)
or transmit (Xmit) event sub-sampling. However, due to the presence of side
lobe artifacts from RF sub-sampling, the standard beamformer often produces
blurry images with less contrast, which are unsuitable for diagnostic purposes.
Existing compressed sensing approaches often require either hardware changes or
computationally expensive algorithms, but their quality improvements are
limited. To address this problem, here we propose a novel deep learning
approach that directly interpolates the missing RF data by utilizing redundancy
in the Rx-Xmit plane. Our extensive experimental results using sub-sampled RF
data from a multi-line acquisition B-mode system confirm that the proposed
method can effectively reduce the data rate without sacrificing image quality.Comment: The title has been changed. This version will appear in IEEE Trans.
on Medical Imagin
- …