Search CORE

14,302 research outputs found

Learning Spatio-Temporal Representation with Local and Global Diffusion

Author: Mei Tao
Ngo Chong-Wah
Qiu Zhaofan
Tian Xinmei
Yao Ting
Publication venue
Publication date: 01/06/2019
Field of study

Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video recognition, since video is an information-intensive media with complex temporal variations. In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD). Specifically, we construct a novel neural network architecture that learns the local and global representations in parallel. The architecture is composed of LGD blocks, where each block updates local and global features by modeling the diffusions between these two representations. Diffusions effectively interact two aspects of information, i.e., localized and holistic, for more powerful way of representation learning. Furthermore, a kernelized classifier is introduced to combine the representations from two aspects for video recognition. Our LGD networks achieve clear improvements on the large-scale Kinetics-400 and Kinetics-600 video classification datasets against the best competitors by 3.5% and 0.7%. We further examine the generalization of both the global and local representations produced by our pre-trained LGD networks on four different benchmarks for video action recognition and spatio-temporal action detection tasks. Superior performances over several state-of-the-art techniques on these benchmarks are reported. Code is available at: https://github.com/ZhaofanQiu/local-and-global-diffusion-networks.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

Latent Semantic Learning with Structured Sparse Representation for Human Action Recognition

Author: Balasubramanian
Belkin
Blei
Cheng
Donoho
Hofmann
Jenatton
Lafon
Liu
Lu
Niebles
Olshausen
Parameswaran
Tibshirani
Turaga
Wang
Wright
Yan
Yuxin Peng
Zhiwu Lu
Publication venue: 'Elsevier BV'
Publication date: 22/09/2011
Field of study

This paper proposes a novel latent semantic learning method for extracting high-level features (i.e. latent semantics) from a large vocabulary of abundant mid-level features (i.e. visual keywords) with structured sparse representation, which can help to bridge the semantic gap in the challenging task of human action recognition. To discover the manifold structure of midlevel features, we develop a spectral embedding approach to latent semantic learning based on L1-graph, without the need to tune any parameter for graph construction as a key step of manifold learning. More importantly, we construct the L1-graph with structured sparse representation, which can be obtained by structured sparse coding with its structured sparsity ensured by novel L1-norm hypergraph regularization over mid-level features. In the new embedding space, we learn latent semantics automatically from abundant mid-level features through spectral clustering. The learnt latent semantics can be readily used for human action recognition with SVM by defining a histogram intersection kernel. Different from the traditional latent semantic analysis based on topic models, our latent semantic learning method can explore the manifold structure of mid-level features in both L1-graph construction and spectral embedding, which results in compact but discriminative high-level features. The experimental results on the commonly used KTH action dataset and unconstrained YouTube action dataset show the superior performance of our method.Comment: The short version of this paper appears in ICCV 201

arXiv.org e-Print Archive

Crossref

Log-Euclidean Bag of Words for Human Action Recognition

Author: Bhatia R.
Conrad Sanderson
Lazebnik S.
Masoud Faraki
Maziar Palhang
Wong Y.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2015
Field of study

Representing videos by densely extracted local space-time features has recently become a popular approach for analysing actions. In this paper, we tackle the problem of categorising human actions by devising Bag of Words (BoW) models based on covariance matrices of spatio-temporal features, with the features formed from histograms of optical flow. Since covariance matrices form a special type of Riemannian manifold, the space of Symmetric Positive Definite (SPD) matrices, non-Euclidean geometry should be taken into account while discriminating between covariance matrices. To this end, we propose to embed SPD manifolds to Euclidean spaces via a diffeomorphism and extend the BoW approach to its Riemannian version. The proposed BoW approach takes into account the manifold geometry of SPD matrices during the generation of the codebook and histograms. Experiments on challenging human action datasets show that the proposed method obtains notable improvements in discrimination accuracy, in comparison to several state-of-the-art methods

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Queensland University of Technology ePrints Archive

University of Queensland eSpace

Invariance of visual operations at the level of receptive fields

Author: Lindeberg Tony
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Receptive field profiles registered by cell recordings have shown that mammalian vision has developed receptive fields tuned to different sizes and orientations in the image domain as well as to different image velocities in space-time. This article presents a theoretical model by which families of idealized receptive field profiles can be derived mathematically from a small set of basic assumptions that correspond to structural properties of the environment. The article also presents a theory for how basic invariance properties to variations in scale, viewing direction and relative motion can be obtained from the output of such receptive fields, using complementary selection mechanisms that operate over the output of families of receptive fields tuned to different parameters. Thereby, the theory shows how basic invariance properties of a visual system can be obtained already at the level of receptive fields, and we can explain the different shapes of receptive field profiles found in biological vision from a requirement that the visual system should be invariant to the natural types of image transformations that occur in its environment.Comment: 40 pages, 17 figure

arXiv.org e-Print Archive

Publikationer från KTH

Public Library of Science (PLOS)

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Predicting Spatio-Temporal Time Series Using Dimension Reduced Local States

Author: Datseris George
Isensee Jonas
Parlitz Ulrich
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/04/2019
Field of study

We present a method for both cross estimation and iterated time series prediction of spatio temporal dynamics based on reconstructed local states, PCA dimension reduction, and local modelling using nearest neighbour methods. The effectiveness of this approach is shown for (noisy) data from a (cubic) Barkley model, the Bueno-Orovio-Cherry-Fenton model, and the Kuramoto-Sivashinsky model

arXiv.org e-Print Archive

MPG.PuRe

Recommended from our members

Geometric principles of second messenger dynamics in dendritic spines.

Author: Bartol Thomas M
Cugno Andrea
Iyengar Ravi
Rangamani Padmini
Sejnowski Terrence J
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

Dendritic spines are small, bulbous protrusions along dendrites in neurons and play a critical role in synaptic transmission. Dendritic spines come in a variety of shapes that depend on their developmental state. Additionally, roughly 14-19% of mature spines have a specialized endoplasmic reticulum called the spine apparatus. How does the shape of a postsynaptic spine and its internal organization affect the spatio-temporal dynamics of short timescale signaling? Answers to this question are central to our understanding the initiation of synaptic transmission, learning, and memory formation. In this work, we investigated the effect of spine and spine apparatus size and shape on the spatio-temporal dynamics of second messengers using mathematical modeling using reaction-diffusion equations in idealized geometries (ellipsoids, spheres, and mushroom-shaped). Our analyses and simulations showed that in the short timescale, spine size and shape coupled with the spine apparatus geometries govern the spatiotemporal dynamics of second messengers. We show that the curvature of the geometries gives rise to pseudo-harmonic functions, which predict the locations of maximum and minimum concentrations along the spine head. Furthermore, we showed that the lifetime of the concentration gradient can be fine-tuned by localization of fluxes on the spine head and varying the relative curvatures and distances between the spine apparatus and the spine head. Thus, we have identified several key geometric determinants of how the spine head and spine apparatus may regulate the short timescale chemical dynamics of small molecules that control synaptic plasticity

eScholarship - University of California