440 research outputs found
Motion-blurred Video Interpolation and Extrapolation
Abrupt motion of camera or objects in a scene result in a blurry video, and
therefore recovering high quality video requires two types of enhancements:
visual enhancement and temporal upsampling. A broad range of research attempted
to recover clean frames from blurred image sequences or temporally upsample
frames by interpolation, yet there are very limited studies handling both
problems jointly. In this work, we present a novel framework for deblurring,
interpolating and extrapolating sharp frames from a motion-blurred video in an
end-to-end manner. We design our framework by first learning the pixel-level
motion that caused the blur from the given inputs via optical flow estimation
and then predict multiple clean frames by warping the decoded features with the
estimated flows. To ensure temporal coherence across predicted frames and
address potential temporal ambiguity, we propose a simple, yet effective
flow-based rule. The effectiveness and favorability of our approach are
highlighted through extensive qualitative and quantitative evaluations on
motion-blurred datasets from high speed videos.Comment: Accepted to AAAI 202
Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
To address the challenging task of instance-aware human part parsing, a new
bottom-up regime is proposed to learn category-level human semantic
segmentation as well as multi-person pose estimation in a joint and end-to-end
manner. It is a compact, efficient and powerful framework that exploits
structural information over different human granularities and eases the
difficulty of person partitioning. Specifically, a dense-to-sparse projection
field, which allows explicitly associating dense human semantics with sparse
keypoints, is learnt and progressively improved over the network feature
pyramid for robustness. Then, the difficult pixel grouping problem is cast as
an easier, multi-person joint assembling task. By formulating joint association
as maximum-weight bipartite matching, a differentiable solution is developed to
exploit projected gradient descent and Dykstra's cyclic projection algorithm.
This makes our method end-to-end trainable and allows back-propagating the
grouping error to directly supervise multi-granularity human representation
learning. This is distinguished from current bottom-up human parsers or pose
estimators which require sophisticated post-processing or heuristic greedy
algorithms. Experiments on three instance-aware human parsing datasets show
that our model outperforms other bottom-up alternatives with much more
efficient inference.Comment: CVPR 2021 (Oral). Code: https://github.com/tfzhou/MG-HumanParsin
StereoFlowGAN: Co-training for Stereo and Flow with Unsupervised Domain Adaptation
We introduce a novel training strategy for stereo matching and optical flow
estimation that utilizes image-to-image translation between synthetic and real
image domains. Our approach enables the training of models that excel in real
image scenarios while relying solely on ground-truth information from synthetic
images. To facilitate task-agnostic domain adaptation and the training of
task-specific components, we introduce a bidirectional feature warping module
that handles both left-right and forward-backward directions. Experimental
results show competitive performance over previous domain translation-based
methods, which substantiate the efficacy of our proposed framework, effectively
leveraging the benefits of unsupervised domain adaptation, stereo matching, and
optical flow estimation.Comment: Accepted by BMVC 202
- …