59,337 research outputs found

    Cross-Modal Message Passing for Two-stream Fusion

    Full text link
    Processing and fusing information among multi-modal is a very useful technique for achieving high performance in many computer vision problems. In order to tackle multi-modal information more effectively, we introduce a novel framework for multi-modal fusion: Cross-modal Message Passing (CMMP). Specifically, we propose a cross-modal message passing mechanism to fuse two-stream network for action recognition, which composes of an appearance modal network (RGB image) and a motion modal (optical flow image) network. The objectives of individual networks in this framework are two-fold: a standard classification objective and a competing objective. The classification object ensures that each modal network predicts the true action category while the competing objective encourages each modal network to outperform the other one. We quantitatively show that the proposed CMMP fuses the traditional two-stream network more effectively, and outperforms all existing two-stream fusion method on UCF-101 and HMDB-51 datasets.Comment: 2018 IEEE International Conference on Acoustics, Speech and Signal Processin

    STAR: A Concise Deep Learning Framework for Citywide Human Mobility Prediction

    Full text link
    Human mobility forecasting in a city is of utmost importance to transportation and public safety, but with the process of urbanization and the generation of big data, intensive computing and determination of mobility pattern have become challenging. This study focuses on how to improve the accuracy and efficiency of predicting citywide human mobility via a simpler solution. A spatio-temporal mobility event prediction framework based on a single fully-convolutional residual network (STAR) is proposed. STAR is a highly simple, general and effective method for learning a single tensor representing the mobility event. Residual learning is utilized for training the deep network to derive the detailed result for scenarios of citywide prediction. Extensive benchmark evaluation results on real-world data demonstrate that STAR outperforms state-of-the-art approaches in single- and multi-step prediction while utilizing fewer parameters and achieving higher efficiency.Comment: Accepted by MDM 201

    Second-order Temporal Pooling for Action Recognition

    Full text link
    Deep learning models for video-based action recognition usually generate features for short clips (consisting of a few frames); such clip-level features are aggregated to video-level representations by computing statistics on these features. Typically zero-th (max) or the first-order (average) statistics are used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel end-to-end learnable feature aggregation scheme, dubbed temporal correlation pooling that generates an action descriptor for a video sequence by capturing the similarities between the temporal evolution of clip-level CNN features computed across the video. Such a descriptor, while being computationally cheap, also naturally encodes the co-activations of multiple CNN features, thereby providing a richer characterization of actions than their first-order counterparts. We also propose higher-order extensions of this scheme by computing correlations after embedding the CNN features in a reproducing kernel Hilbert space. We provide experiments on benchmark datasets such as HMDB-51 and UCF-101, fine-grained datasets such as MPII Cooking activities and JHMDB, as well as the recent Kinetics-600. Our results demonstrate the advantages of higher-order pooling schemes that when combined with hand-crafted features (as is standard practice) achieves state-of-the-art accuracy.Comment: Accepted in the International Journal of Computer Vision (IJCV

    Nonlinear brain dynamics as macroscopic manifestation of underlying many-body field dynamics

    Full text link
    Neural activity patterns related to behavior occur at many scales in time and space from the atomic and molecular to the whole brain. Here we explore the feasibility of interpreting neurophysiological data in the context of many-body physics by using tools that physicists have devised to analyze comparable hierarchies in other fields of science. We focus on a mesoscopic level that offers a multi-step pathway between the microscopic functions of neurons and the macroscopic functions of brain systems revealed by hemodynamic imaging. We use electroencephalographic (EEG) records collected from high-density electrode arrays fixed on the epidural surfaces of primary sensory and limbic areas in rabbits and cats trained to discriminate conditioned stimuli (CS) in the various modalities. High temporal resolution of EEG signals with the Hilbert transform gives evidence for diverse intermittent spatial patterns of amplitude (AM) and phase modulations (PM) of carrier waves that repeatedly re-synchronize in the beta and gamma ranges at near zero time lags over long distances. The dominant mechanism for neural interactions by axodendritic synaptic transmission should impose distance-dependent delays on the EEG oscillations owing to finite propagation velocities. It does not. EEGs instead show evidence for anomalous dispersion: the existence in neural populations of a low velocity range of information and energy transfers, and a high velocity range of the spread of phase transitions. This distinction labels the phenomenon but does not explain it. In this report we explore the analysis of these phenomena using concepts of energy dissipation, the maintenance by cortex of multiple ground states corresponding to AM patterns, and the exclusive selection by spontaneous breakdown of symmetry (SBS) of single states in sequences.Comment: 31 page
    • …
    corecore