9,624 research outputs found
Dynamic Variational Autoencoders for Visual Process Modeling
This work studies the problem of modeling visual processes by leveraging deep
generative architectures for learning linear, Gaussian representations from
observed sequences. We propose a joint learning framework, combining a vector
autoregressive model and Variational Autoencoders. This results in an
architecture that allows Variational Autoencoders to simultaneously learn a
non-linear observation as well as a linear state model from sequences of
frames. We validate our approach on artificial sequences and dynamic textures
Statistical Analysis of Dynamic Actions
Real-world action recognition applications require the development of systems which are fast, can handle a large variety of actions without a priori knowledge of the type of actions, need a minimal number of parameters, and necessitate as short as possible learning stage. In this paper, we suggest such an approach. We regard dynamic activities as long-term temporal objects, which are characterized by spatio-temporal features at multiple temporal scales. Based on this, we design a simple statistical distance measure between video sequences which captures the similarities in their behavioral content. This measure is nonparametric and can thus handle a wide range of complex dynamic actions. Having a behavior-based distance measure between sequences, we use it for a variety of tasks, including: video indexing, temporal segmentation, and action-based video clustering. These tasks are performed without prior knowledge of the types of actions, their models, or their temporal extents
An Adaptive Dictionary Learning Approach for Modeling Dynamical Textures
Video representation is an important and challenging task in the computer
vision community. In this paper, we assume that image frames of a moving scene
can be modeled as a Linear Dynamical System. We propose a sparse coding
framework, named adaptive video dictionary learning (AVDL), to model a video
adaptively. The developed framework is able to capture the dynamics of a moving
scene by exploring both sparse properties and the temporal correlations of
consecutive video frames. The proposed method is compared with state of the art
video processing methods on several benchmark data sequences, which exhibit
appearance changes and heavy occlusions
Modeling Dynamic Swarms
This paper proposes the problem of modeling video sequences of dynamic swarms
(DS). We define DS as a large layout of stochastically repetitive spatial
configurations of dynamic objects (swarm elements) whose motions exhibit local
spatiotemporal interdependency and stationarity, i.e., the motions are similar
in any small spatiotemporal neighborhood. Examples of DS abound in nature,
e.g., herds of animals and flocks of birds. To capture the local spatiotemporal
properties of the DS, we present a probabilistic model that learns both the
spatial layout of swarm elements and their joint dynamics that are modeled as
linear transformations. To this end, a spatiotemporal neighborhood is
associated with each swarm element, in which local stationarity is enforced
both spatially and temporally. We assume that the prior on the swarm dynamics
is distributed according to an MRF in both space and time. Embedding this model
in a MAP framework, we iterate between learning the spatial layout of the swarm
and its dynamics. We learn the swarm transformations using ICM, which iterates
between estimating these transformations and updating their distribution in the
spatiotemporal neighborhoods. We demonstrate the validity of our method by
conducting experiments on real video sequences. Real sequences of birds, geese,
robot swarms, and pedestrians evaluate the applicability of our model to real
world data.Comment: 11 pages, 17 figures, conference paper, computer visio
Convolutional Neural Network on Three Orthogonal Planes for Dynamic Texture Classification
Dynamic Textures (DTs) are sequences of images of moving scenes that exhibit
certain stationarity properties in time such as smoke, vegetation and fire. The
analysis of DT is important for recognition, segmentation, synthesis or
retrieval for a range of applications including surveillance, medical imaging
and remote sensing. Deep learning methods have shown impressive results and are
now the new state of the art for a wide range of computer vision tasks
including image and video recognition and segmentation. In particular,
Convolutional Neural Networks (CNNs) have recently proven to be well suited for
texture analysis with a design similar to a filter bank approach. In this
paper, we develop a new approach to DT analysis based on a CNN method applied
on three orthogonal planes x y , xt and y t . We train CNNs on spatial frames
and temporal slices extracted from the DT sequences and combine their outputs
to obtain a competitive DT classifier. Our results on a wide range of commonly
used DT classification benchmark datasets prove the robustness of our approach.
Significant improvement of the state of the art is shown on the larger
datasets.Comment: 19 pages, 10 figure
Deep Markov Random Field for Image Modeling
Markov Random Fields (MRFs), a formulation widely used in generative image
modeling, have long been plagued by the lack of expressive power. This issue is
primarily due to the fact that conventional MRFs formulations tend to use
simplistic factors to capture local patterns. In this paper, we move beyond
such limitations, and propose a novel MRF model that uses fully-connected
neurons to express the complex interactions among pixels. Through theoretical
analysis, we reveal an inherent connection between this model and recurrent
neural networks, and thereon derive an approximated feed-forward network that
couples multiple RNNs along opposite directions. This formulation combines the
expressive power of deep neural networks and the cyclic dependency structure of
MRF in a unified model, bringing the modeling capability to a new level. The
feed-forward approximation also allows it to be efficiently learned from data.
Experimental results on a variety of low-level vision tasks show notable
improvement over state-of-the-arts.Comment: Accepted at ECCV 201
Motion Textures: Modeling, Classification, and Segmentation Using Mixed-State Markov Random Fields
published_or_final_versio
- …