29,228 research outputs found
Deep learning with convolutional neural networks for EEG decoding and visualization
PLEASE READ AND CITE THE REVISED VERSION at Human Brain Mapping:
http://onlinelibrary.wiley.com/doi/10.1002/hbm.23730/full
Code available here: https://github.com/robintibor/braindecodeComment: A revised manuscript (with the new title) has been accepted at Human
Brain Mapping, see http://onlinelibrary.wiley.com/doi/10.1002/hbm.23730/ful
Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Spatiotemporal forecasting has various applications in neuroscience, climate
and transportation domain. Traffic forecasting is one canonical example of such
learning task. The task is challenging due to (1) complex spatial dependency on
road networks, (2) non-linear temporal dynamics with changing road conditions
and (3) inherent difficulty of long-term forecasting. To address these
challenges, we propose to model the traffic flow as a diffusion process on a
directed graph and introduce Diffusion Convolutional Recurrent Neural Network
(DCRNN), a deep learning framework for traffic forecasting that incorporates
both spatial and temporal dependency in the traffic flow. Specifically, DCRNN
captures the spatial dependency using bidirectional random walks on the graph,
and the temporal dependency using the encoder-decoder architecture with
scheduled sampling. We evaluate the framework on two real-world large scale
road network traffic datasets and observe consistent improvement of 12% - 15%
over state-of-the-art baselines.Comment: Published as a conference paper at ICLR 201
Deep Learning on Traffic Prediction: Methods, Analysis and Future Directions
Traffic prediction plays an essential role in intelligent transportation
system. Accurate traffic prediction can assist route planing, guide vehicle
dispatching, and mitigate traffic congestion. This problem is challenging due
to the complicated and dynamic spatio-temporal dependencies between different
regions in the road network. Recently, a significant amount of research efforts
have been devoted to this area, especially deep learning method, greatly
advancing traffic prediction abilities. The purpose of this paper is to provide
a comprehensive survey on deep learning-based approaches in traffic prediction
from multiple perspectives. Specifically, we first summarize the existing
traffic prediction methods, and give a taxonomy. Second, we list the
state-of-the-art approaches in different traffic prediction applications.
Third, we comprehensively collect and organize widely used public datasets in
the existing literature to facilitate other researchers. Furthermore, we give
an evaluation and analysis by conducting extensive experiments to compare the
performance of different methods on a real-world public dataset. Finally, we
discuss open challenges in this field.Comment: to be published in IEEE Transactions on Intelligent Transportation
System
Multi Resolution LSTM For Long Term Prediction In Neural Activity Video
Epileptic seizures are caused by abnormal, overly syn- chronized, electrical
activity in the brain. The abnor- mal electrical activity manifests as waves,
propagating across the brain. Accurate prediction of the propagation velocity
and direction of these waves could enable real- time responsive brain
stimulation to suppress or prevent the seizures entirely. However, this problem
is very chal- lenging because the algorithm must be able to predict the neural
signals in a sufficiently long time horizon to allow enough time for medical
intervention. We consider how to accomplish long term prediction using a LSTM
network. To alleviate the vanishing gradient problem, we propose two
encoder-decoder-predictor structures, both using multi-resolution
representation. The novel LSTM structure with multi-resolution layers could
significantly outperform the single-resolution benchmark with similar number of
parameters. To overcome the blurring effect associated with video prediction in
the pixel domain using standard mean square error (MSE) loss, we use energy-
based adversarial training to improve the long-term pre- diction. We
demonstrate and analyze how a discriminative model with an encoder-decoder
structure using 3D CNN model improves long term prediction
Searching for Efficient Multi-Scale Architectures for Dense Image Prediction
The design of neural network architectures is an important component for
achieving state-of-the-art performance with machine learning systems across a
broad array of tasks. Much work has endeavored to design and build
architectures automatically through clever construction of a search space
paired with simple learning algorithms. Recent progress has demonstrated that
such meta-learning methods may exceed scalable human-invented architectures on
image classification tasks. An open question is the degree to which such
methods may generalize to new domains. In this work we explore the construction
of meta-learning techniques for dense image prediction focused on the tasks of
scene parsing, person-part segmentation, and semantic image segmentation.
Constructing viable search spaces in this domain is challenging because of the
multi-scale representation of visual information and the necessity to operate
on high resolution imagery. Based on a survey of techniques in dense image
prediction, we construct a recursive search space and demonstrate that even
with efficient random search, we can identify architectures that outperform
human-invented architectures and achieve state-of-the-art performance on three
dense prediction tasks including 82.7\% on Cityscapes (street scene parsing),
71.3\% on PASCAL-Person-Part (person-part segmentation), and 87.9\% on PASCAL
VOC 2012 (semantic image segmentation). Additionally, the resulting
architecture is more computationally efficient, requiring half the parameters
and half the computational cost as previous state of the art systems.Comment: Accepted by NIPS 201
A Tutorial on Deep Learning for Music Information Retrieval
Following their success in Computer Vision and other areas, deep learning
techniques have recently become widely adopted in Music Information Retrieval
(MIR) research. However, the majority of works aim to adopt and assess methods
that have been shown to be effective in other domains, while there is still a
great need for more original research focusing on music primarily and utilising
musical knowledge and insight. The goal of this paper is to boost the interest
of beginners by providing a comprehensive tutorial and reducing the barriers to
entry into deep learning for MIR. We lay out the basic principles and review
prominent works in this hard to navigate the field. We then outline the network
structures that have been successful in MIR problems and facilitate the
selection of building blocks for the problems at hand. Finally, guidelines for
new tasks and some advanced topics in deep learning are discussed to stimulate
new research in this fascinating field
Towards Visual Explanations for Convolutional Neural Networks via Input Resampling
The predictive power of neural networks often costs model interpretability.
Several techniques have been developed for explaining model outputs in terms of
input features; however, it is difficult to translate such interpretations into
actionable insight. Here, we propose a framework to analyze predictions in
terms of the model's internal features by inspecting information flow through
the network. Given a trained network and a test image, we select neurons by two
metrics, both measured over a set of images created by perturbations to the
input image: (1) magnitude of the correlation between the neuron activation and
the network output and (2) precision of the neuron activation. We show that the
former metric selects neurons that exert large influence over the network
output while the latter metric selects neurons that activate on generalizable
features. By comparing the sets of neurons selected by these two metrics, our
framework suggests a way to investigate the internal attention mechanisms of
convolutional neural networks.Comment: Presented at ICML 2017 Workshop on Visualization for Deep Learnin
DeepChrome: Deep-learning for predicting gene expression from histone modifications
Motivation: Histone modifications are among the most important factors that
control gene regulation. Computational methods that predict gene expression
from histone modification signals are highly desirable for understanding their
combinatorial effects in gene regulation. This knowledge can help in developing
'epigenetic drugs' for diseases like cancer. Previous studies for quantifying
the relationship between histone modifications and gene expression levels
either failed to capture combinatorial effects or relied on multiple methods
that separate predictions and combinatorial analysis. This paper develops a
unified discriminative framework using a deep convolutional neural network to
classify gene expression using histone modification data as input. Our system,
called DeepChrome, allows automatic extraction of complex interactions among
important features. To simultaneously visualize the combinatorial interactions
among histone modifications, we propose a novel optimization-based technique
that generates feature pattern maps from the learnt deep model. This provides
an intuitive description of underlying epigenetic mechanisms that regulate
genes. Results: We show that DeepChrome outperforms state-of-the-art models
like Support Vector Machines and Random Forests for gene expression
classification task on 56 different cell-types from REMC database. The output
of our visualization technique not only validates the previous observations but
also allows novel insights about combinatorial interactions among histone
modification marks, some of which have recently been observed by experimental
studies.Comment: This work will be originally published in Bioinformatics Journal
(ECCB 2016
A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization
In mulsemedia applications, traditional media content (text, image, audio,
video, etc.) can be related to media objects that target other human senses
(e.g., smell, haptics, taste). Such applications aim at bridging the virtual
and real worlds through sensors and actuators. Actuators are responsible for
the execution of sensory effects (e.g., wind, heat, light), which produce
sensory stimulations on the users. In these applications sensory stimulation
must happen in a timely manner regarding the other traditional media content
being presented. For example, at the moment in which an explosion is presented
in the audiovisual content, it may be adequate to activate actuators that
produce heat and light. It is common to use some declarative multimedia
authoring language to relate the timestamp in which each media object is to be
presented to the execution of some sensory effect. One problem in this setting
is that the synchronization of media objects and sensory effects is done
manually by the author(s) of the application, a process which is time-consuming
and error prone. In this paper, we present a bimodal neural network
architecture to assist the synchronization task in mulsemedia applications. Our
approach is based on the idea that audio and video signals can be used
simultaneously to identify the timestamps in which some sensory effect should
be executed. Our learning architecture combines audio and video signals for the
prediction of scene components. For evaluation purposes, we construct a dataset
based on Google's AudioSet. We provide experiments to validate our bimodal
architecture. Our results show that the bimodal approach produces better
results when compared to several variants of unimodal architectures
On the Use of Deep Learning for Blind Image Quality Assessment
In this work we investigate the use of deep learning for distortion-generic
blind image quality assessment. We report on different design choices, ranging
from the use of features extracted from pre-trained Convolutional Neural
Networks (CNNs) as a generic image description, to the use of features
extracted from a CNN fine-tuned for the image quality task. Our best proposal,
named DeepBIQ, estimates the image quality by average pooling the scores
predicted on multiple sub-regions of the original image. The score of each
sub-region is computed using a Support Vector Regression (SVR) machine taking
as input features extracted using a CNN fine-tuned for category-based image
quality assessment. Experimental results on the LIVE In the Wild Image Quality
Challenge Database and on the LIVE Image Quality Assessment Database show that
DeepBIQ outperforms the state-of-the-art methods compared, having a Linear
Correlation Coefficient (LCC) with human subjective scores of almost 0.91 and
0.98 respectively. Furthermore, in most of the cases, the quality score
predictions of DeepBIQ are closer to the average observer than those of a
generic human observer
- …