6 research outputs found
Common Diffusion Noise Schedules and Sample Steps are Flawed
We discover that common diffusion noise schedules do not enforce the last
timestep to have zero signal-to-noise ratio (SNR), and some implementations of
diffusion samplers do not start from the last timestep. Such designs are flawed
and do not reflect the fact that the model is given pure Gaussian noise at
inference, creating a discrepancy between training and inference. We show that
the flawed design causes real problems in existing implementations. In Stable
Diffusion, it severely limits the model to only generate images with medium
brightness and prevents it from generating very bright and dark samples. We
propose a few simple fixes: (1) rescale the noise schedule to enforce zero
terminal SNR; (2) train the model with v prediction; (3) change the sampler to
always start from the last timestep; (4) rescale classifier-free guidance to
prevent over-exposure. These simple changes ensure the diffusion process is
congruent between training and inference and allow the model to generate
samples more faithful to the original data distribution
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation
This paper addresses the issue of modifying the visual appearance of videos
while preserving their motion. A novel framework, named MagicProp, is proposed,
which disentangles the video editing process into two stages: appearance
editing and motion-aware appearance propagation. In the first stage, MagicProp
selects a single frame from the input video and applies image-editing
techniques to modify the content and/or style of the frame. The flexibility of
these techniques enables the editing of arbitrary regions within the frame. In
the second stage, MagicProp employs the edited frame as an appearance reference
and generates the remaining frames using an autoregressive rendering approach.
To achieve this, a diffusion-based conditional generation model, called
PropDPM, is developed, which synthesizes the target frame by conditioning on
the reference appearance, the target motion, and its previous appearance. The
autoregressive editing approach ensures temporal consistency in the resulting
videos. Overall, MagicProp combines the flexibility of image-editing techniques
with the superior temporal consistency of autoregressive modeling, enabling
flexible editing of object types and aesthetic styles in arbitrary regions of
input videos while maintaining good temporal consistency across frames.
Extensive experiments in various video editing scenarios demonstrate the
effectiveness of MagicProp
Monitoring land cover change and disturbance of the Mount Wutai World Cultural Landscape Heritage Protected Area based on Remote Sensing time-series image from 1987 to 2018
The contextual-based multi-source time-series remote sensing and proposed Comprehensive Heritage Area Threats Index (CHATI) index are used to analyze the spatiotemporal land use/land cover (LULC) and threats to the Mount Wutai World Heritage Area. The results show disturbances, such as forest coverage, vegetation conditions, mining area, and built-up area, in the research area changed dramatically. According to the CHATI, although different disturbances have positive or negative influences on environment, as an integrated system it kept stable from 1987 to 2018. Finally, this research uses linear regression and the F-test to mark the remarkable spatial-temporal variation. In consequence, the threats on Mount Wutai be addressed from the macro level and the micro level. Although there still have some drawbacks, the effectiveness of threat identification has been tested using field validation and the results are a reliable tool to raise the public awareness of WHA protection and governance
A labor-free index-guided semantic segmentation approach for urban vegetation mapping from high-resolution true color imagery
Accurate and timely information on urban vegetation (UV) can be used as an important indicator to estimate the health of cities. Due to the low cost of RGB cameras, true color imagery (TCI) has been widely used for high spatial resolution UV mapping. However, the current index-based and classifier-based UV mapping approaches face problems of the poor ability to accurately distinguish UV and the high reliance on massive annotated samples, respectively. To address this issue, an index-guided semantic segmentation (IGSS) framework is proposed in this paper. Firstly, a novel cross-scale vegetation index (CSVI) is calculated by the combination of TCI and Sentinel-2 images, and the index value can be used to provide an initial UV map. Secondly, reliable UV and non-UV samples are automatically generated for training the semantic segmentation model, and then the refined UV map can be produced. The experimental results show that the proposed CSVI outperformed the existing five RGB vegetation indices in highlighting UV cover and suppressing complex backgrounds, and the proposed IGSS workflow achieved satisfactory results with an OA of 87.72% ∼ 88.16% and an F1 score of 87.73% ∼ 88.37%, which is comparable with the fully-supervised method