54,313 research outputs found
MeshfreeFlowNet: A Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework
We propose MeshfreeFlowNet, a novel deep learning-based super-resolution
framework to generate continuous (grid-free) spatio-temporal solutions from the
low-resolution inputs. While being computationally efficient, MeshfreeFlowNet
accurately recovers the fine-scale quantities of interest. MeshfreeFlowNet
allows for: (i) the output to be sampled at all spatio-temporal resolutions,
(ii) a set of Partial Differential Equation (PDE) constraints to be imposed,
and (iii) training on fixed-size inputs on arbitrarily sized spatio-temporal
domains owing to its fully convolutional encoder. We empirically study the
performance of MeshfreeFlowNet on the task of super-resolution of turbulent
flows in the Rayleigh-Benard convection problem. Across a diverse set of
evaluation metrics, we show that MeshfreeFlowNet significantly outperforms
existing baselines. Furthermore, we provide a large scale implementation of
MeshfreeFlowNet and show that it efficiently scales across large clusters,
achieving 96.80% scaling efficiency on up to 128 GPUs and a training time of
less than 4 minutes.Comment: Supplementary Video: https://youtu.be/mjqwPch9gDo. Accepted to SC2
Extraction and Classification of Diving Clips from Continuous Video Footage
Due to recent advances in technology, the recording and analysis of video
data has become an increasingly common component of athlete training
programmes. Today it is incredibly easy and affordable to set up a fixed camera
and record athletes in a wide range of sports, such as diving, gymnastics,
golf, tennis, etc. However, the manual analysis of the obtained footage is a
time-consuming task which involves isolating actions of interest and
categorizing them using domain-specific knowledge. In order to automate this
kind of task, three challenging sub-problems are often encountered: 1)
temporally cropping events/actions of interest from continuous video; 2)
tracking the object of interest; and 3) classifying the events/actions of
interest.
Most previous work has focused on solving just one of the above sub-problems
in isolation. In contrast, this paper provides a complete solution to the
overall action monitoring task in the context of a challenging real-world
exemplar. Specifically, we address the problem of diving classification. This
is a challenging problem since the person (diver) of interest typically
occupies fewer than 1% of the pixels in each frame. The model is required to
learn the temporal boundaries of a dive, even though other divers and
bystanders may be in view. Finally, the model must be sensitive to subtle
changes in body pose over a large number of frames to determine the
classification code. We provide effective solutions to each of the sub-problems
which combine to provide a highly functional solution to the task as a whole.
The techniques proposed can be easily generalized to video footage recorded
from other sports.Comment: To appear at CVsports 201
Unsupervised Video Understanding by Reconciliation of Posture Similarities
Understanding human activity and being able to explain it in detail surpasses
mere action classification by far in both complexity and value. The challenge
is thus to describe an activity on the basis of its most fundamental
constituents, the individual postures and their distinctive transitions.
Supervised learning of such a fine-grained representation based on elementary
poses is very tedious and does not scale. Therefore, we propose a completely
unsupervised deep learning procedure based solely on video sequences, which
starts from scratch without requiring pre-trained networks, predefined body
models, or keypoints. A combinatorial sequence matching algorithm proposes
relations between frames from subsets of the training data, while a CNN is
reconciling the transitivity conflicts of the different subsets to learn a
single concerted pose embedding despite changes in appearance across sequences.
Without any manual annotation, the model learns a structured representation of
postures and their temporal development. The model not only enables retrieval
of similar postures but also temporal super-resolution. Additionally, based on
a recurrent formulation, next frames can be synthesized.Comment: Accepted by ICCV 201
- …