342 research outputs found

    Video Probabilistic Diffusion Models in Projected Latent Space

    Full text link
    Despite the remarkable progress in deep generative models, synthesizing high-resolution and temporally coherent videos still remains a challenge due to their high-dimensionality and complex temporal dynamics along with large spatial variations. Recent works on diffusion models have shown their potential to solve this challenge, yet they suffer from severe computation- and memory-inefficiency that limit the scalability. To handle this issue, we propose a novel generative model for videos, coined projected latent video diffusion models (PVDM), a probabilistic diffusion model which learns a video distribution in a low-dimensional latent space and thus can be efficiently trained with high-resolution videos under limited resources. Specifically, PVDM is composed of two components: (a) an autoencoder that projects a given video as 2D-shaped latent vectors that factorize the complex cubic structure of video pixels and (b) a diffusion model architecture specialized for our new factorized latent space and the training/sampling procedure to synthesize videos of arbitrary length with a single model. Experiments on popular video generation datasets demonstrate the superiority of PVDM compared with previous video synthesis methods; e.g., PVDM obtains the FVD score of 639.7 on the UCF-101 long video (128 frames) generation benchmark, which improves 1773.4 of the prior state-of-the-art.Comment: Project page: https://sihyun.me/PVD

    Ultrafast dynamics of fractional particles in α\alpha-RuCl3_3

    Full text link
    In a Kitaev spin liquid, electron spins can break into fractional particles known as Majorana fermions and Z2_2 fluxes. Recent experiments have indicated the existence of such fractional particles in a two-dimensional Kitaev material candidate, α\alpha-RuCl3_3. These exotic particles can be used in topological quantum computations when braided within their lifetimes. However, the lifetimes of these particles, critical for applications in topological quantum computing, have not been reported. Here we study ultrafast dynamics of photoinduced excitations in single crystals of α\alpha-RuCl3_3 using pump-probe transient grating spectroscopy. We observe intriguing photoexcited nonequilibrium states in the Kitaev paramagnetic regime between TNT_N~7 K and THT_H~100 K, where TNT_N is the N\'eel temperature and THT_H is set by the Kitaev interaction. Two distinct lifetimes are detected: a longer lifetime of ~50 ps, independent of temperature; a shorter lifetime of 1-20 ps, with a strong temperature dependence, T−1.40T^{-1.40}. We analyze the transient grating signals using coupled differential equations and propose that the long and short lifetimes are associated with fractional particles in the Kitaev paramagnetic regime, Z2_2 fluxes and Majorana fermions, respectively

    Machine Learning Based PCB/Package Stack-up Optimization For Signal Integrity

    Get PDF
    PCB/package stack-up design optimization is time-consuming and requiring a great deal of experience. Although some iterative optimization algorithms are applied to implement automatic stack-up design, evaluating the results of each iteration is still time-intensive. This paper proposes a combined Bayesian optimization-artificial neural network (BO-ANN) algorithm, utilizing a trained ANN-based surrogate model to replace a 2D cross-section analysis tool for fast PCB/package stack-up design optimization. With the acceleration of ANN, the proposed BO-ANN algorithm can finish 100 iterations in 40 seconds while achieving the target characteristic impedance. To better generalize the BO-ANN algorithm, a strategy of effective dielectric calculation is applied to multiple-dielectric stack-up optimization. the BO-ANN algorithm will be able to output optimized stack-up designs with dielectric layers chosen from the pre-defined library and the obtained designs are verified by 2D solver

    ASAP: Accurate semantic segmentation for real time performance

    Full text link
    Feature fusion modules from encoder and self-attention module have been adopted in semantic segmentation. However, the computation of these modules is costly and has operational limitations in real-time environments. In addition, segmentation performance is limited in autonomous driving environments with a lot of contextual information perpendicular to the road surface, such as people, buildings, and general objects. In this paper, we propose an efficient feature fusion method, Feature Fusion with Different Norms (FFDN) that utilizes rich global context of multi-level scale and vertical pooling module before self-attention that preserves most contextual information while reducing the complexity of global context encoding in the vertical direction. By doing this, we could handle the properties of representation in global space and reduce additional computational cost. In addition, we analyze low performance in challenging cases including small and vertically featured objects. We achieve the mean Interaction of-union(mIoU) of 73.1 and the Frame Per Second(FPS) of 191, which are comparable results with state-of-the-arts on Cityscapes test datasets.Comment: 5 pages, 4 figure

    SS-IL: Separated Softmax for Incremental Learning

    Full text link
    We consider class incremental learning (CIL) problem, in which a learning agent continuously learns new classes from incrementally arriving training data batches and aims to predict well on all the classes learned so far. The main challenge of the problem is the catastrophic forgetting, and for the exemplar-memory based CIL methods, it is generally known that the forgetting is commonly caused by the prediction score bias that is injected due to the data imbalance between the new classes and the old classes (in the exemplar-memory). While several methods have been proposed to correct such score bias by some additional post-processing, e.g., score re-scaling or balanced fine-tuning, no systematic analysis on the root cause of such bias has been done. To that end, we analyze that computing the softmax probabilities by combining the output scores for all old and new classes could be the main source of the bias and propose a new CIL method, Separated Softmax for Incremental Learning (SS-IL). Our SS-IL consists of separated softmax (SS) output layer and ratio-preserving (RP) mini-batches combined with task-wise knowledge distillation (TKD), and through extensive experimental results, we show our SS-IL achieves very strong state-of-the-art accuracy on several large-scale benchmarks. We also show SS-IL makes much more balanced prediction, without any additional post-processing steps as is done in other baselines

    Collaborative Score Distillation for Consistent Visual Synthesis

    Full text link
    Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities. However, when adapting these priors to complex visual modalities, often represented as multiple images (e.g., video), achieving consistency across a set of images is challenging. In this paper, we address this challenge with a novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein Variational Gradient Descent (SVGD). Specifically, we propose to consider multiple samples as "particles" in the SVGD update and combine their score functions to distill generative priors over a set of images synchronously. Thus, CSD facilitates seamless integration of information across 2D images, leading to a consistent visual synthesis across multiple samples. We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.Comment: Project page with visuals: https://subin-kim-cv.github.io/CSD
    • …
    corecore