29,766 research outputs found
Video Interpolation using Optical Flow and Laplacian Smoothness
Non-rigid video interpolation is a common computer vision task. In this paper
we present an optical flow approach which adopts a Laplacian Cotangent Mesh
constraint to enhance the local smoothness. Similar to Li et al., our approach
adopts a mesh to the image with a resolution up to one vertex per pixel and
uses angle constraints to ensure sensible local deformations between image
pairs. The Laplacian Mesh constraints are expressed wholly inside the optical
flow optimization, and can be applied in a straightforward manner to a wide
range of image tracking and registration problems. We evaluate our approach by
testing on several benchmark datasets, including the Middlebury and Garg et al.
datasets. In addition, we show application of our method for constructing 3D
Morphable Facial Models from dynamic 3D data
Focus Is All You Need: Loss Functions For Event-based Vision
Event cameras are novel vision sensors that output pixel-level brightness
changes ("events") instead of traditional video frames. These asynchronous
sensors offer several advantages over traditional cameras, such as, high
temporal resolution, very high dynamic range, and no motion blur. To unlock the
potential of such sensors, motion compensation methods have been recently
proposed. We present a collection and taxonomy of twenty two objective
functions to analyze event alignment in motion compensation approaches (Fig.
1). We call them Focus Loss Functions since they have strong connections with
functions used in traditional shape-from-focus applications. The proposed loss
functions allow bringing mature computer vision tools to the realm of event
cameras. We compare the accuracy and runtime performance of all loss functions
on a publicly available dataset, and conclude that the variance, the gradient
and the Laplacian magnitudes are among the best loss functions. The
applicability of the loss functions is shown on multiple tasks: rotational
motion, depth and optical flow estimation. The proposed focus loss functions
allow to unlock the outstanding properties of event cameras.Comment: 29 pages, 19 figures, 4 table
Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together
Neural networks equipped with self-attention have parallelizable computation,
light-weight structure, and the ability to capture both long-range and local
dependencies. Further, their expressive power and performance can be boosted by
using a vector to measure pairwise dependency, but this requires to expand the
alignment matrix to a tensor, which results in memory and computation
bottlenecks. In this paper, we propose a novel attention mechanism called
"Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as
memory-efficient as a CNN, but significantly outperforms previous
CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token)
and global (source2token) dependencies by a novel compatibility function
composed of dot-product and additive attentions, 2) uses a tensor to represent
the feature-wise alignment scores for better expressive power but only requires
parallelizable matrix multiplications, and 3) combines multi-head with
multi-dimensional attentions, and applies a distinct positional mask to each
head (subspace), so the memory and computation can be distributed to multiple
heads, each with sequential information encoded independently. The experiments
show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or
competitive performance on nine NLP benchmarks with compelling memory- and
time-efficiency
- …