14,302 research outputs found
Learning Spatio-Temporal Representation with Local and Global Diffusion
Convolutional Neural Networks (CNN) have been regarded as a powerful class of
models for visual recognition problems. Nevertheless, the convolutional filters
in these networks are local operations while ignoring the large-range
dependency. Such drawback becomes even worse particularly for video
recognition, since video is an information-intensive media with complex
temporal variations. In this paper, we present a novel framework to boost the
spatio-temporal representation learning by Local and Global Diffusion (LGD).
Specifically, we construct a novel neural network architecture that learns the
local and global representations in parallel. The architecture is composed of
LGD blocks, where each block updates local and global features by modeling the
diffusions between these two representations. Diffusions effectively interact
two aspects of information, i.e., localized and holistic, for more powerful way
of representation learning. Furthermore, a kernelized classifier is introduced
to combine the representations from two aspects for video recognition. Our LGD
networks achieve clear improvements on the large-scale Kinetics-400 and
Kinetics-600 video classification datasets against the best competitors by 3.5%
and 0.7%. We further examine the generalization of both the global and local
representations produced by our pre-trained LGD networks on four different
benchmarks for video action recognition and spatio-temporal action detection
tasks. Superior performances over several state-of-the-art techniques on these
benchmarks are reported. Code is available at:
https://github.com/ZhaofanQiu/local-and-global-diffusion-networks.Comment: CVPR 201
Latent Semantic Learning with Structured Sparse Representation for Human Action Recognition
This paper proposes a novel latent semantic learning method for extracting
high-level features (i.e. latent semantics) from a large vocabulary of abundant
mid-level features (i.e. visual keywords) with structured sparse
representation, which can help to bridge the semantic gap in the challenging
task of human action recognition. To discover the manifold structure of
midlevel features, we develop a spectral embedding approach to latent semantic
learning based on L1-graph, without the need to tune any parameter for graph
construction as a key step of manifold learning. More importantly, we construct
the L1-graph with structured sparse representation, which can be obtained by
structured sparse coding with its structured sparsity ensured by novel L1-norm
hypergraph regularization over mid-level features. In the new embedding space,
we learn latent semantics automatically from abundant mid-level features
through spectral clustering. The learnt latent semantics can be readily used
for human action recognition with SVM by defining a histogram intersection
kernel. Different from the traditional latent semantic analysis based on topic
models, our latent semantic learning method can explore the manifold structure
of mid-level features in both L1-graph construction and spectral embedding,
which results in compact but discriminative high-level features. The
experimental results on the commonly used KTH action dataset and unconstrained
YouTube action dataset show the superior performance of our method.Comment: The short version of this paper appears in ICCV 201
Log-Euclidean Bag of Words for Human Action Recognition
Representing videos by densely extracted local space-time features has
recently become a popular approach for analysing actions. In this paper, we
tackle the problem of categorising human actions by devising Bag of Words (BoW)
models based on covariance matrices of spatio-temporal features, with the
features formed from histograms of optical flow. Since covariance matrices form
a special type of Riemannian manifold, the space of Symmetric Positive Definite
(SPD) matrices, non-Euclidean geometry should be taken into account while
discriminating between covariance matrices. To this end, we propose to embed
SPD manifolds to Euclidean spaces via a diffeomorphism and extend the BoW
approach to its Riemannian version. The proposed BoW approach takes into
account the manifold geometry of SPD matrices during the generation of the
codebook and histograms. Experiments on challenging human action datasets show
that the proposed method obtains notable improvements in discrimination
accuracy, in comparison to several state-of-the-art methods
Invariance of visual operations at the level of receptive fields
Receptive field profiles registered by cell recordings have shown that
mammalian vision has developed receptive fields tuned to different sizes and
orientations in the image domain as well as to different image velocities in
space-time. This article presents a theoretical model by which families of
idealized receptive field profiles can be derived mathematically from a small
set of basic assumptions that correspond to structural properties of the
environment. The article also presents a theory for how basic invariance
properties to variations in scale, viewing direction and relative motion can be
obtained from the output of such receptive fields, using complementary
selection mechanisms that operate over the output of families of receptive
fields tuned to different parameters. Thereby, the theory shows how basic
invariance properties of a visual system can be obtained already at the level
of receptive fields, and we can explain the different shapes of receptive field
profiles found in biological vision from a requirement that the visual system
should be invariant to the natural types of image transformations that occur in
its environment.Comment: 40 pages, 17 figure
Predicting Spatio-Temporal Time Series Using Dimension Reduced Local States
We present a method for both cross estimation and iterated time series
prediction of spatio temporal dynamics based on reconstructed local states, PCA
dimension reduction, and local modelling using nearest neighbour methods. The
effectiveness of this approach is shown for (noisy) data from a (cubic) Barkley
model, the Bueno-Orovio-Cherry-Fenton model, and the Kuramoto-Sivashinsky
model
Recommended from our members
Geometric principles of second messenger dynamics in dendritic spines.
Dendritic spines are small, bulbous protrusions along dendrites in neurons and play a critical role in synaptic transmission. Dendritic spines come in a variety of shapes that depend on their developmental state. Additionally, roughly 14-19% of mature spines have a specialized endoplasmic reticulum called the spine apparatus. How does the shape of a postsynaptic spine and its internal organization affect the spatio-temporal dynamics of short timescale signaling? Answers to this question are central to our understanding the initiation of synaptic transmission, learning, and memory formation. In this work, we investigated the effect of spine and spine apparatus size and shape on the spatio-temporal dynamics of second messengers using mathematical modeling using reaction-diffusion equations in idealized geometries (ellipsoids, spheres, and mushroom-shaped). Our analyses and simulations showed that in the short timescale, spine size and shape coupled with the spine apparatus geometries govern the spatiotemporal dynamics of second messengers. We show that the curvature of the geometries gives rise to pseudo-harmonic functions, which predict the locations of maximum and minimum concentrations along the spine head. Furthermore, we showed that the lifetime of the concentration gradient can be fine-tuned by localization of fluxes on the spine head and varying the relative curvatures and distances between the spine apparatus and the spine head. Thus, we have identified several key geometric determinants of how the spine head and spine apparatus may regulate the short timescale chemical dynamics of small molecules that control synaptic plasticity
- …