15 research outputs found
Real-World Repetition Estimation by Div, Grad and Curl
We consider the problem of estimating repetition in video, such as performing
push-ups, cutting a melon or playing violin. Existing work shows good results
under the assumption of static and stationary periodicity. As realistic video
is rarely perfectly static and stationary, the often preferred Fourier-based
measurements is inapt. Instead, we adopt the wavelet transform to better handle
non-static and non-stationary video dynamics. From the flow field and its
differentials, we derive three fundamental motion types and three motion
continuities of intrinsic periodicity in 3D. On top of this, the 2D perception
of 3D periodicity considers two extreme viewpoints. What follows are 18
fundamental cases of recurrent perception in 2D. In practice, to deal with the
variety of repetitive appearance, our theory implies measuring time-varying
flow and its differentials (gradient, divergence and curl) over segmented
foreground motion. For experiments, we introduce the new QUVA Repetition
dataset, reflecting reality by including non-static and non-stationary videos.
On the task of counting repetitions in video, we obtain favorable results
compared to a deep learning alternative
Subitizing with Variational Autoencoders
Numerosity, the number of objects in a set, is a basic property of a given
visual scene. Many animals develop the perceptual ability to subitize: the
near-instantaneous identification of the numerosity in small sets of visual
items. In computer vision, it has been shown that numerosity emerges as a
statistical property in neural networks during unsupervised learning from
simple synthetic images. In this work, we focus on more complex natural images
using unsupervised hierarchical neural networks. Specifically, we show that
variational autoencoders are able to spontaneously perform subitizing after
training without supervision on a large amount images from the Salient Object
Subitizing dataset. While our method is unable to outperform supervised
convolutional networks for subitizing, we observe that the networks learn to
encode numerosity as basic visual property. Moreover, we find that the learned
representations are likely invariant to object area; an observation in
alignment with studies on biological neural networks in cognitive neuroscience
Context-aware and Scale-insensitive Temporal Repetition Counting
Temporal repetition counting aims to estimate the number of cycles of a given
repetitive action. Existing deep learning methods assume repetitive actions are
performed in a fixed time-scale, which is invalid for the complex repetitive
actions in real life. In this paper, we tailor a context-aware and
scale-insensitive framework, to tackle the challenges in repetition counting
caused by the unknown and diverse cycle-lengths. Our approach combines two key
insights: (1) Cycle lengths from different actions are unpredictable that
require large-scale searching, but, once a coarse cycle length is determined,
the variety between repetitions can be overcome by regression. (2) Determining
the cycle length cannot only rely on a short fragment of video but a contextual
understanding. The first point is implemented by a coarse-to-fine cycle
refinement method. It avoids the heavy computation of exhaustively searching
all the cycle lengths in the video, and, instead, it propagates the coarse
prediction for further refinement in a hierarchical manner. We secondly propose
a bidirectional cycle length estimation method for a context-aware prediction.
It is a regression network that takes two consecutive coarse cycles as input,
and predicts the locations of the previous and next repetitive cycles. To
benefit the training and evaluation of temporal repetition counting area, we
construct a new and largest benchmark, which contains 526 videos with diverse
repetitive actions. Extensive experiments show that the proposed network
trained on a single dataset outperforms state-of-the-art methods on several
benchmarks, indicating that the proposed framework is general enough to capture
repetition patterns across domains.Comment: Accepted by CVPR202