84 research outputs found

    TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation

    Full text link
    Video frame interpolation (VFI) aims to synthesize an intermediate frame between two consecutive frames. State-of-the-art approaches usually adopt a two-step solution, which includes 1) generating locally-warped pixels by flow-based motion estimations, 2) blending the warped pixels to form a full frame through deep neural synthesis networks. However, due to the inconsistent warping from the two consecutive frames, the warped features for new frames are usually not aligned, which leads to distorted and blurred frames, especially when large and complex motions occur. To solve this issue, in this paper we propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI). In particular, we formulate the warped features with inconsistent motions as query tokens, and formulate relevant regions in a motion trajectory from two original consecutive frames into keys and values. Self-attention is learned on relevant tokens along the trajectory to blend the pristine features into intermediate frames through end-to-end training. Experimental results demonstrate that our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks. Both code and pre-trained models will be released soon

    Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution

    Full text link
    Existing real-world video super-resolution (VSR) methods focus on designing a general degradation pipeline for open-domain videos while ignoring data intrinsic characteristics which strongly limit their performance when applying to some specific domains (eg., animation videos). In this paper, we thoroughly explore the characteristics of animation videos and leverage the rich priors in real-world animation data for a more practical animation VSR model. In particular, we propose a multi-scale Vector-Quantized Degradation model for animation video Super-Resolution (VQD-SR) to decompose the local details from global structures and transfer the degradation priors in real-world animation videos to a learned vector-quantized codebook for degradation modeling. A rich-content Real Animation Low-quality (RAL) video dataset is collected for extracting the priors. We further propose a data enhancement strategy for high-resolution (HR) training videos based on our observation that existing HR videos are mostly collected from the Web which contains conspicuous compression artifacts. The proposed strategy is valid to lift the upper bound of animation VSR performance, regardless of the specific VSR model. Experimental results demonstrate the superiority of the proposed VQD-SR over state-of-the-art methods, through extensive quantitative and qualitative evaluations of the latest animation video super-resolution benchmark. The code and pre-trained models can be downloaded at https://github.com/researchmm/VQD-SR

    Gait Cycle-Inspired Learning Strategy for Continuous Prediction of Knee Joint Trajectory from sEMG

    Full text link
    Predicting lower limb motion intent is vital for controlling exoskeleton robots and prosthetic limbs. Surface electromyography (sEMG) attracts increasing attention in recent years as it enables ahead-of-time prediction of motion intentions before actual movement. However, the estimation performance of human joint trajectory remains a challenging problem due to the inter- and intra-subject variations. The former is related to physiological differences (such as height and weight) and preferred walking patterns of individuals, while the latter is mainly caused by irregular and gait-irrelevant muscle activity. This paper proposes a model integrating two gait cycle-inspired learning strategies to mitigate the challenge for predicting human knee joint trajectory. The first strategy is to decouple knee joint angles into motion patterns and amplitudes former exhibit low variability while latter show high variability among individuals. By learning through separate network entities, the model manages to capture both the common and personalized gait features. In the second, muscle principal activation masks are extracted from gait cycles in a prolonged walk. These masks are used to filter out components unrelated to walking from raw sEMG and provide auxiliary guidance to capture more gait-related features. Experimental results indicate that our model could predict knee angles with the average root mean square error (RMSE) of 3.03(0.49) degrees and 50ms ahead of time. To our knowledge this is the best performance in relevant literatures that has been reported, with reduced RMSE by at least 9.5%

    Fuzzy-based indoor scene modeling with differentiated examples

    No full text
    Abstract Well-designed indoor scenes incorporate interior design knowledge, which has been an essential prior for most indoor scene modeling methods. However, the layout qualities of indoor scene datasets are often uneven, and most existing data-driven methods do not differentiate indoor scene examples in terms of quality. In this work, we aim to explore an approach that leverages datasets with differentiated indoor scene examples for indoor scene modeling. Our solution conducts subjective evaluations on lightweight datasets having various room configurations and furniture layouts, via pairwise comparisons based on fuzzy set theory. We also develop a system to use such examples to guide indoor scene modeling using user-specified objects. Specifically, we focus on object groups associated with certain human activities, and define room features to encode the relations between the position and direction of an object group and the room configuration. To perform indoor scene modeling, given an empty room, our system first assesses it in terms of the user-specified object groups, and then places associated objects in the room guided by the assessment results. A series of experimental results and comparisons to state-of-the-art indoor scene synthesis methods are presented to validate the usefulness and effectiveness of our approach

    Component-aware generative autoencoder for structure hybrid and shape completion

    No full text
    Assembling components of man-made objects to create new structures or complete 3D shapes is a popular approach in 3D modeling techniques. Recently, leveraging deep neural networks for assembly-based 3D modeling has been widely studied. However, exploring new component combinations even across different categories is still challenging for most of the deep-learning-based 3D modeling methods. In this paper, we propose a novel generative autoencoder that tackles the component combinations for 3D modeling of man-made objects. We use the segmented input objects to create component volumes that have redundant components and random configurations. By using the input objects and the associated component volumes to train the autoencoder, we can obtain an object volume consisting of components with proper quality and structure as the network output. Such a generative autoencoder can be applied to either multiple object categories for structure hybrid or a single object category for shape completion. We conduct a series of evaluations and experimental results to demonstrate the usability and practicability of our method

    A Cost-Constrained Video Quality Satisfaction Study on Mobile Devices

    No full text
    • …
    corecore