59,337 research outputs found
Cross-Modal Message Passing for Two-stream Fusion
Processing and fusing information among multi-modal is a very useful
technique for achieving high performance in many computer vision problems. In
order to tackle multi-modal information more effectively, we introduce a novel
framework for multi-modal fusion: Cross-modal Message Passing (CMMP).
Specifically, we propose a cross-modal message passing mechanism to fuse
two-stream network for action recognition, which composes of an appearance
modal network (RGB image) and a motion modal (optical flow image) network. The
objectives of individual networks in this framework are two-fold: a standard
classification objective and a competing objective. The classification object
ensures that each modal network predicts the true action category while the
competing objective encourages each modal network to outperform the other one.
We quantitatively show that the proposed CMMP fuses the traditional two-stream
network more effectively, and outperforms all existing two-stream fusion method
on UCF-101 and HMDB-51 datasets.Comment: 2018 IEEE International Conference on Acoustics, Speech and Signal
Processin
STAR: A Concise Deep Learning Framework for Citywide Human Mobility Prediction
Human mobility forecasting in a city is of utmost importance to
transportation and public safety, but with the process of urbanization and the
generation of big data, intensive computing and determination of mobility
pattern have become challenging. This study focuses on how to improve the
accuracy and efficiency of predicting citywide human mobility via a simpler
solution. A spatio-temporal mobility event prediction framework based on a
single fully-convolutional residual network (STAR) is proposed. STAR is a
highly simple, general and effective method for learning a single tensor
representing the mobility event. Residual learning is utilized for training the
deep network to derive the detailed result for scenarios of citywide
prediction. Extensive benchmark evaluation results on real-world data
demonstrate that STAR outperforms state-of-the-art approaches in single- and
multi-step prediction while utilizing fewer parameters and achieving higher
efficiency.Comment: Accepted by MDM 201
Second-order Temporal Pooling for Action Recognition
Deep learning models for video-based action recognition usually generate
features for short clips (consisting of a few frames); such clip-level features
are aggregated to video-level representations by computing statistics on these
features. Typically zero-th (max) or the first-order (average) statistics are
used. In this paper, we explore the benefits of using second-order statistics.
Specifically, we propose a novel end-to-end learnable feature aggregation
scheme, dubbed temporal correlation pooling that generates an action descriptor
for a video sequence by capturing the similarities between the temporal
evolution of clip-level CNN features computed across the video. Such a
descriptor, while being computationally cheap, also naturally encodes the
co-activations of multiple CNN features, thereby providing a richer
characterization of actions than their first-order counterparts. We also
propose higher-order extensions of this scheme by computing correlations after
embedding the CNN features in a reproducing kernel Hilbert space. We provide
experiments on benchmark datasets such as HMDB-51 and UCF-101, fine-grained
datasets such as MPII Cooking activities and JHMDB, as well as the recent
Kinetics-600. Our results demonstrate the advantages of higher-order pooling
schemes that when combined with hand-crafted features (as is standard practice)
achieves state-of-the-art accuracy.Comment: Accepted in the International Journal of Computer Vision (IJCV
Nonlinear brain dynamics as macroscopic manifestation of underlying many-body field dynamics
Neural activity patterns related to behavior occur at many scales in time and
space from the atomic and molecular to the whole brain. Here we explore the
feasibility of interpreting neurophysiological data in the context of many-body
physics by using tools that physicists have devised to analyze comparable
hierarchies in other fields of science. We focus on a mesoscopic level that
offers a multi-step pathway between the microscopic functions of neurons and
the macroscopic functions of brain systems revealed by hemodynamic imaging. We
use electroencephalographic (EEG) records collected from high-density electrode
arrays fixed on the epidural surfaces of primary sensory and limbic areas in
rabbits and cats trained to discriminate conditioned stimuli (CS) in the
various modalities. High temporal resolution of EEG signals with the Hilbert
transform gives evidence for diverse intermittent spatial patterns of amplitude
(AM) and phase modulations (PM) of carrier waves that repeatedly re-synchronize
in the beta and gamma ranges at near zero time lags over long distances. The
dominant mechanism for neural interactions by axodendritic synaptic
transmission should impose distance-dependent delays on the EEG oscillations
owing to finite propagation velocities. It does not. EEGs instead show evidence
for anomalous dispersion: the existence in neural populations of a low velocity
range of information and energy transfers, and a high velocity range of the
spread of phase transitions. This distinction labels the phenomenon but does
not explain it. In this report we explore the analysis of these phenomena using
concepts of energy dissipation, the maintenance by cortex of multiple ground
states corresponding to AM patterns, and the exclusive selection by spontaneous
breakdown of symmetry (SBS) of single states in sequences.Comment: 31 page
- …