6 research outputs found

    Common Diffusion Noise Schedules and Sample Steps are Flawed

    Full text link
    We discover that common diffusion noise schedules do not enforce the last timestep to have zero signal-to-noise ratio (SNR), and some implementations of diffusion samplers do not start from the last timestep. Such designs are flawed and do not reflect the fact that the model is given pure Gaussian noise at inference, creating a discrepancy between training and inference. We show that the flawed design causes real problems in existing implementations. In Stable Diffusion, it severely limits the model to only generate images with medium brightness and prevents it from generating very bright and dark samples. We propose a few simple fixes: (1) rescale the noise schedule to enforce zero terminal SNR; (2) train the model with v prediction; (3) change the sampler to always start from the last timestep; (4) rescale classifier-free guidance to prevent over-exposure. These simple changes ensure the diffusion process is congruent between training and inference and allow the model to generate samples more faithful to the original data distribution

    MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation

    Full text link
    This paper addresses the issue of modifying the visual appearance of videos while preserving their motion. A novel framework, named MagicProp, is proposed, which disentangles the video editing process into two stages: appearance editing and motion-aware appearance propagation. In the first stage, MagicProp selects a single frame from the input video and applies image-editing techniques to modify the content and/or style of the frame. The flexibility of these techniques enables the editing of arbitrary regions within the frame. In the second stage, MagicProp employs the edited frame as an appearance reference and generates the remaining frames using an autoregressive rendering approach. To achieve this, a diffusion-based conditional generation model, called PropDPM, is developed, which synthesizes the target frame by conditioning on the reference appearance, the target motion, and its previous appearance. The autoregressive editing approach ensures temporal consistency in the resulting videos. Overall, MagicProp combines the flexibility of image-editing techniques with the superior temporal consistency of autoregressive modeling, enabling flexible editing of object types and aesthetic styles in arbitrary regions of input videos while maintaining good temporal consistency across frames. Extensive experiments in various video editing scenarios demonstrate the effectiveness of MagicProp

    Monitoring land cover change and disturbance of the Mount Wutai World Cultural Landscape Heritage Protected Area based on Remote Sensing time-series image from 1987 to 2018

    Get PDF
    The contextual-based multi-source time-series remote sensing and proposed Comprehensive Heritage Area Threats Index (CHATI) index are used to analyze the spatiotemporal land use/land cover (LULC) and threats to the Mount Wutai World Heritage Area. The results show disturbances, such as forest coverage, vegetation conditions, mining area, and built-up area, in the research area changed dramatically. According to the CHATI, although different disturbances have positive or negative influences on environment, as an integrated system it kept stable from 1987 to 2018. Finally, this research uses linear regression and the F-test to mark the remarkable spatial-temporal variation. In consequence, the threats on Mount Wutai be addressed from the macro level and the micro level. Although there still have some drawbacks, the effectiveness of threat identification has been tested using field validation and the results are a reliable tool to raise the public awareness of WHA protection and governance

    A labor-free index-guided semantic segmentation approach for urban vegetation mapping from high-resolution true color imagery

    No full text
    Accurate and timely information on urban vegetation (UV) can be used as an important indicator to estimate the health of cities. Due to the low cost of RGB cameras, true color imagery (TCI) has been widely used for high spatial resolution UV mapping. However, the current index-based and classifier-based UV mapping approaches face problems of the poor ability to accurately distinguish UV and the high reliance on massive annotated samples, respectively. To address this issue, an index-guided semantic segmentation (IGSS) framework is proposed in this paper. Firstly, a novel cross-scale vegetation index (CSVI) is calculated by the combination of TCI and Sentinel-2 images, and the index value can be used to provide an initial UV map. Secondly, reliable UV and non-UV samples are automatically generated for training the semantic segmentation model, and then the refined UV map can be produced. The experimental results show that the proposed CSVI outperformed the existing five RGB vegetation indices in highlighting UV cover and suppressing complex backgrounds, and the proposed IGSS workflow achieved satisfactory results with an OA of 87.72% ∼ 88.16% and an F1 score of 87.73% ∼ 88.37%, which is comparable with the fully-supervised method
    corecore