80 research outputs found
Pose Constraints for Consistent Self-supervised Monocular Depth and Ego-motion
Self-supervised monocular depth estimation approaches suffer not only from
scale ambiguity but also infer temporally inconsistent depth maps w.r.t. scale.
While disambiguating scale during training is not possible without some kind of
ground truth supervision, having scale consistent depth predictions would make
it possible to calculate scale once during inference as a post-processing step
and use it over-time. With this as a goal, a set of temporal consistency losses
that minimize pose inconsistencies over time are introduced. Evaluations show
that introducing these constraints not only reduces depth inconsistencies but
also improves the baseline performance of depth and ego-motion prediction.Comment: Scandinavian Conference on Image Analysis (SCIA) 202
Learn to cycle: Time-consistent feature discovery for action recognition
Generalizing over temporal variations is a prerequisite for effective action
recognition in videos. Despite significant advances in deep neural networks, it
remains a challenge to focus on short-term discriminative motions in relation
to the overall performance of an action. We address this challenge by allowing
some flexibility in discovering relevant spatio-temporal features. We introduce
Squeeze and Recursion Temporal Gates (SRTG), an approach that favors inputs
with similar activations with potential temporal variations. We implement this
idea with a novel CNN block that uses an LSTM to encapsulate feature dynamics,
in conjunction with a temporal gate that is responsible for evaluating the
consistency of the discovered dynamics and the modeled features. We show
consistent improvement when using SRTG blocks, with only a minimal increase in
the number of GFLOPs. On Kinetics-700, we perform on par with current
state-of-the-art models, and outperform these on HACS, Moments in Time, UCF-101
and HMDB-51
- …