342 research outputs found
Video Probabilistic Diffusion Models in Projected Latent Space
Despite the remarkable progress in deep generative models, synthesizing
high-resolution and temporally coherent videos still remains a challenge due to
their high-dimensionality and complex temporal dynamics along with large
spatial variations. Recent works on diffusion models have shown their potential
to solve this challenge, yet they suffer from severe computation- and
memory-inefficiency that limit the scalability. To handle this issue, we
propose a novel generative model for videos, coined projected latent video
diffusion models (PVDM), a probabilistic diffusion model which learns a video
distribution in a low-dimensional latent space and thus can be efficiently
trained with high-resolution videos under limited resources. Specifically, PVDM
is composed of two components: (a) an autoencoder that projects a given video
as 2D-shaped latent vectors that factorize the complex cubic structure of video
pixels and (b) a diffusion model architecture specialized for our new
factorized latent space and the training/sampling procedure to synthesize
videos of arbitrary length with a single model. Experiments on popular video
generation datasets demonstrate the superiority of PVDM compared with previous
video synthesis methods; e.g., PVDM obtains the FVD score of 639.7 on the
UCF-101 long video (128 frames) generation benchmark, which improves 1773.4 of
the prior state-of-the-art.Comment: Project page: https://sihyun.me/PVD
Ultrafast dynamics of fractional particles in -RuCl
In a Kitaev spin liquid, electron spins can break into fractional particles
known as Majorana fermions and Z fluxes. Recent experiments have indicated
the existence of such fractional particles in a two-dimensional Kitaev material
candidate, -RuCl. These exotic particles can be used in topological
quantum computations when braided within their lifetimes. However, the
lifetimes of these particles, critical for applications in topological quantum
computing, have not been reported. Here we study ultrafast dynamics of
photoinduced excitations in single crystals of -RuCl using
pump-probe transient grating spectroscopy. We observe intriguing photoexcited
nonequilibrium states in the Kitaev paramagnetic regime between ~7 K and
~100 K, where is the N\'eel temperature and is set by the
Kitaev interaction. Two distinct lifetimes are detected: a longer lifetime of
~50 ps, independent of temperature; a shorter lifetime of 1-20 ps, with a
strong temperature dependence, . We analyze the transient grating
signals using coupled differential equations and propose that the long and
short lifetimes are associated with fractional particles in the Kitaev
paramagnetic regime, Z fluxes and Majorana fermions, respectively
Machine Learning Based PCB/Package Stack-up Optimization For Signal Integrity
PCB/package stack-up design optimization is time-consuming and requiring a great deal of experience. Although some iterative optimization algorithms are applied to implement automatic stack-up design, evaluating the results of each iteration is still time-intensive. This paper proposes a combined Bayesian optimization-artificial neural network (BO-ANN) algorithm, utilizing a trained ANN-based surrogate model to replace a 2D cross-section analysis tool for fast PCB/package stack-up design optimization. With the acceleration of ANN, the proposed BO-ANN algorithm can finish 100 iterations in 40 seconds while achieving the target characteristic impedance. To better generalize the BO-ANN algorithm, a strategy of effective dielectric calculation is applied to multiple-dielectric stack-up optimization. the BO-ANN algorithm will be able to output optimized stack-up designs with dielectric layers chosen from the pre-defined library and the obtained designs are verified by 2D solver
ASAP: Accurate semantic segmentation for real time performance
Feature fusion modules from encoder and self-attention module have been
adopted in semantic segmentation. However, the computation of these modules is
costly and has operational limitations in real-time environments. In addition,
segmentation performance is limited in autonomous driving environments with a
lot of contextual information perpendicular to the road surface, such as
people, buildings, and general objects. In this paper, we propose an efficient
feature fusion method, Feature Fusion with Different Norms (FFDN) that utilizes
rich global context of multi-level scale and vertical pooling module before
self-attention that preserves most contextual information while reducing the
complexity of global context encoding in the vertical direction. By doing this,
we could handle the properties of representation in global space and reduce
additional computational cost. In addition, we analyze low performance in
challenging cases including small and vertically featured objects. We achieve
the mean Interaction of-union(mIoU) of 73.1 and the Frame Per Second(FPS) of
191, which are comparable results with state-of-the-arts on Cityscapes test
datasets.Comment: 5 pages, 4 figure
SS-IL: Separated Softmax for Incremental Learning
We consider class incremental learning (CIL) problem, in which a learning
agent continuously learns new classes from incrementally arriving training data
batches and aims to predict well on all the classes learned so far. The main
challenge of the problem is the catastrophic forgetting, and for the
exemplar-memory based CIL methods, it is generally known that the forgetting is
commonly caused by the prediction score bias that is injected due to the data
imbalance between the new classes and the old classes (in the exemplar-memory).
While several methods have been proposed to correct such score bias by some
additional post-processing, e.g., score re-scaling or balanced fine-tuning, no
systematic analysis on the root cause of such bias has been done. To that end,
we analyze that computing the softmax probabilities by combining the output
scores for all old and new classes could be the main source of the bias and
propose a new CIL method, Separated Softmax for Incremental Learning (SS-IL).
Our SS-IL consists of separated softmax (SS) output layer and ratio-preserving
(RP) mini-batches combined with task-wise knowledge distillation (TKD), and
through extensive experimental results, we show our SS-IL achieves very strong
state-of-the-art accuracy on several large-scale benchmarks. We also show SS-IL
makes much more balanced prediction, without any additional post-processing
steps as is done in other baselines
Collaborative Score Distillation for Consistent Visual Synthesis
Generative priors of large-scale text-to-image diffusion models enable a wide
range of new generation and editing applications on diverse visual modalities.
However, when adapting these priors to complex visual modalities, often
represented as multiple images (e.g., video), achieving consistency across a
set of images is challenging. In this paper, we address this challenge with a
novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein
Variational Gradient Descent (SVGD). Specifically, we propose to consider
multiple samples as "particles" in the SVGD update and combine their score
functions to distill generative priors over a set of images synchronously.
Thus, CSD facilitates seamless integration of information across 2D images,
leading to a consistent visual synthesis across multiple samples. We show the
effectiveness of CSD in a variety of tasks, encompassing the visual editing of
panorama images, videos, and 3D scenes. Our results underline the competency of
CSD as a versatile method for enhancing inter-sample consistency, thereby
broadening the applicability of text-to-image diffusion models.Comment: Project page with visuals: https://subin-kim-cv.github.io/CSD
- …