1,009 research outputs found
Perception of Motion and Architectural Form: Computational Relationships between Optical Flow and Perspective
Perceptual geometry refers to the interdisciplinary research whose objectives
focuses on study of geometry from the perspective of visual perception, and in
turn, applies such geometric findings to the ecological study of vision.
Perceptual geometry attempts to answer fundamental questions in perception of
form and representation of space through synthesis of cognitive and biological
theories of visual perception with geometric theories of the physical world.
Perception of form, space and motion are among fundamental problems in vision
science. In cognitive and computational models of human perception, the
theories for modeling motion are treated separately from models for perception
of form.Comment: 10 pages, 13 figures, submitted and accepted in DoCEIS'2012
Conference: http://www.uninova.pt/doceis/doceis12/home/home.ph
A Bayesian Nonparametric Approach to Modeling Motion Patterns
The most difficult—and often most essential—
aspect of many interception and tracking tasks is constructing
motion models of the targets to be found. Experts can
often provide only partial information, and fitting parameters
for complex motion patterns can require large amounts
of training data. Specifying how to parameterize complex
motion patterns is in itself a difficult task.
In contrast, nonparametric models are very flexible and
generalize well with relatively little training data. We propose
modeling target motion patterns as a mixture of Gaussian
processes (GP) with a Dirichlet process (DP) prior over
mixture weights. The GP provides a flexible representation
for each individual motion pattern, while the DP assigns observed
trajectories to particular motion patterns. Both automatically
adjust the complexity of the motion model based
on the available data. Our approach outperforms several parametric
models on a helicopter-based car-tracking task on
data collected from the greater Boston area
Conditional Restricted Boltzmann Machines for Structured Output Prediction
Conditional Restricted Boltzmann Machines (CRBMs) are rich probabilistic
models that have recently been applied to a wide range of problems, including
collaborative filtering, classification, and modeling motion capture data.
While much progress has been made in training non-conditional RBMs, these
algorithms are not applicable to conditional models and there has been almost
no work on training and generating predictions from conditional RBMs for
structured output problems. We first argue that standard Contrastive
Divergence-based learning may not be suitable for training CRBMs. We then
identify two distinct types of structured output prediction problems and
propose an improved learning algorithm for each. The first problem type is one
where the output space has arbitrary structure but the set of likely output
configurations is relatively small, such as in multi-label classification. The
second problem is one where the output space is arbitrarily structured but
where the output space variability is much greater, such as in image denoising
or pixel labeling. We show that the new learning algorithms can work much
better than Contrastive Divergence on both types of problems
ContextVP: Fully Context-Aware Video Prediction
Video prediction models based on convolutional networks, recurrent networks,
and their combinations often result in blurry predictions. We identify an
important contributing factor for imprecise predictions that has not been
studied adequately in the literature: blind spots, i.e., lack of access to all
relevant past information for accurately predicting the future. To address this
issue, we introduce a fully context-aware architecture that captures the entire
available past context for each pixel using Parallel Multi-Dimensional LSTM
units and aggregates it using blending units. Our model outperforms a strong
baseline network of 20 recurrent convolutional layers and yields
state-of-the-art performance for next step prediction on three challenging
real-world video datasets: Human 3.6M, Caltech Pedestrian, and UCF-101.
Moreover, it does so with fewer parameters than several recently proposed
models, and does not rely on deep convolutional networks, multi-scale
architectures, separation of background and foreground modeling, motion flow
learning, or adversarial training. These results highlight that full awareness
of past context is of crucial importance for video prediction.Comment: 19 pages. ECCV 2018 oral presentation. Project webpage is at
https://wonmin-byeon.github.io/publication/2018-ecc
- …