84 research outputs found
TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation
Video frame interpolation (VFI) aims to synthesize an intermediate frame
between two consecutive frames. State-of-the-art approaches usually adopt a
two-step solution, which includes 1) generating locally-warped pixels by
flow-based motion estimations, 2) blending the warped pixels to form a full
frame through deep neural synthesis networks. However, due to the inconsistent
warping from the two consecutive frames, the warped features for new frames are
usually not aligned, which leads to distorted and blurred frames, especially
when large and complex motions occur. To solve this issue, in this paper we
propose a novel Trajectory-aware Transformer for Video Frame Interpolation
(TTVFI). In particular, we formulate the warped features with inconsistent
motions as query tokens, and formulate relevant regions in a motion trajectory
from two original consecutive frames into keys and values. Self-attention is
learned on relevant tokens along the trajectory to blend the pristine features
into intermediate frames through end-to-end training. Experimental results
demonstrate that our method outperforms other state-of-the-art methods in four
widely-used VFI benchmarks. Both code and pre-trained models will be released
soon
Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution
Existing real-world video super-resolution (VSR) methods focus on designing a
general degradation pipeline for open-domain videos while ignoring data
intrinsic characteristics which strongly limit their performance when applying
to some specific domains (eg., animation videos). In this paper, we thoroughly
explore the characteristics of animation videos and leverage the rich priors in
real-world animation data for a more practical animation VSR model. In
particular, we propose a multi-scale Vector-Quantized Degradation model for
animation video Super-Resolution (VQD-SR) to decompose the local details from
global structures and transfer the degradation priors in real-world animation
videos to a learned vector-quantized codebook for degradation modeling. A
rich-content Real Animation Low-quality (RAL) video dataset is collected for
extracting the priors. We further propose a data enhancement strategy for
high-resolution (HR) training videos based on our observation that existing HR
videos are mostly collected from the Web which contains conspicuous compression
artifacts. The proposed strategy is valid to lift the upper bound of animation
VSR performance, regardless of the specific VSR model. Experimental results
demonstrate the superiority of the proposed VQD-SR over state-of-the-art
methods, through extensive quantitative and qualitative evaluations of the
latest animation video super-resolution benchmark. The code and pre-trained
models can be downloaded at https://github.com/researchmm/VQD-SR
Urine interleukin-18 and cystatin-C as biomarkers of acute kidney injury in critically ill neonates
Gait Cycle-Inspired Learning Strategy for Continuous Prediction of Knee Joint Trajectory from sEMG
Predicting lower limb motion intent is vital for controlling exoskeleton
robots and prosthetic limbs. Surface electromyography (sEMG) attracts
increasing attention in recent years as it enables ahead-of-time prediction of
motion intentions before actual movement. However, the estimation performance
of human joint trajectory remains a challenging problem due to the inter- and
intra-subject variations. The former is related to physiological differences
(such as height and weight) and preferred walking patterns of individuals,
while the latter is mainly caused by irregular and gait-irrelevant muscle
activity. This paper proposes a model integrating two gait cycle-inspired
learning strategies to mitigate the challenge for predicting human knee joint
trajectory. The first strategy is to decouple knee joint angles into motion
patterns and amplitudes former exhibit low variability while latter show high
variability among individuals. By learning through separate network entities,
the model manages to capture both the common and personalized gait features. In
the second, muscle principal activation masks are extracted from gait cycles in
a prolonged walk. These masks are used to filter out components unrelated to
walking from raw sEMG and provide auxiliary guidance to capture more
gait-related features. Experimental results indicate that our model could
predict knee angles with the average root mean square error (RMSE) of
3.03(0.49) degrees and 50ms ahead of time. To our knowledge this is the best
performance in relevant literatures that has been reported, with reduced RMSE
by at least 9.5%
Fuzzy-based indoor scene modeling with differentiated examples
Abstract Well-designed indoor scenes incorporate interior design knowledge, which has been an essential prior for most indoor scene modeling methods. However, the layout qualities of indoor scene datasets are often uneven, and most existing data-driven methods do not differentiate indoor scene examples in terms of quality. In this work, we aim to explore an approach that leverages datasets with differentiated indoor scene examples for indoor scene modeling. Our solution conducts subjective evaluations on lightweight datasets having various room configurations and furniture layouts, via pairwise comparisons based on fuzzy set theory. We also develop a system to use such examples to guide indoor scene modeling using user-specified objects. Specifically, we focus on object groups associated with certain human activities, and define room features to encode the relations between the position and direction of an object group and the room configuration. To perform indoor scene modeling, given an empty room, our system first assesses it in terms of the user-specified object groups, and then places associated objects in the room guided by the assessment results. A series of experimental results and comparisons to state-of-the-art indoor scene synthesis methods are presented to validate the usefulness and effectiveness of our approach
Component-aware generative autoencoder for structure hybrid and shape completion
Assembling components of man-made objects to create new structures or complete 3D shapes is a popular approach in 3D modeling techniques. Recently, leveraging deep neural networks for assembly-based 3D modeling has been widely studied. However, exploring new component combinations even across different categories is still challenging for most of the deep-learning-based 3D modeling methods. In this paper, we propose a novel generative autoencoder that tackles the component combinations for 3D modeling of man-made objects. We use the segmented input objects to create component volumes that have redundant components and random configurations. By using the input objects and the associated component volumes to train the autoencoder, we can obtain an object volume consisting of components with proper quality and structure as the network output. Such a generative autoencoder can be applied to either multiple object categories for structure hybrid or a single object category for shape completion. We conduct a series of evaluations and experimental results to demonstrate the usability and practicability of our method
- …