76 research outputs found
Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport
Optimal Transport (OT) problem investigates a transport map that bridges two
distributions while minimizing a given cost function. In this regard, OT
between tractable prior distribution and data has been utilized for generative
modeling tasks. However, OT-based methods are susceptible to outliers and face
optimization challenges during training. In this paper, we propose a novel
generative model based on the semi-dual formulation of Unbalanced Optimal
Transport (UOT). Unlike OT, UOT relaxes the hard constraint on distribution
matching. This approach provides better robustness against outliers, stability
during training, and faster convergence. We validate these properties
empirically through experiments. Moreover, we study the theoretical upper-bound
of divergence between distributions in UOT. Our model outperforms existing
OT-based generative models, achieving FID scores of 2.97 on CIFAR-10 and 5.80
on CelebA-HQ-256.Comment: 23 pages, 15 figure
Feature-aligned N-BEATS with Sinkhorn divergence
In this study, we propose Feature-aligned N-BEATS as a domain generalization
model for univariate time series forecasting problems. The proposed model is an
extension of the doubly residual stacking architecture of N-BEATS (Oreshkin et
al. [34]) into a representation learning framework. The model is a new
structure that involves marginal feature probability measures (i.e.,
pushforward measures of multiple source domains) induced by the intricate
composition of residual operators of N-BEATS in each stack and aligns them
stack-wise via an entropic regularized Wasserstein distance referred to as the
Sinkhorn divergence (Genevay et al. [14]). The loss function consists of a
typical forecasting loss for multiple source domains and an alignment loss
calculated with the Sinkhorn divergence, which allows the model to learn
invariant features stack-wise across multiple source data sequences while
retaining N-BEATS's interpretable design. We conduct a comprehensive
experimental evaluation of the proposed approach and the results demonstrate
the model's forecasting and generalization capabilities in comparison with
methods based on the original N-BEATS
MARA-Net: Single Image Deraining Network with Multi-level connections and Adaptive Regional Attentions
Removing rain streaks from single images is an important problem in various
computer vision tasks because rain streaks can degrade outdoor images and
reduce their visibility. While recent convolutional neural network-based
deraining models have succeeded in capturing rain streaks effectively,
difficulties in recovering the details in rain-free images still remain. In
this paper, we present a multi-level connection and adaptive regional attention
network (MARA-Net) to properly restore the original background textures in
rainy images. The first main idea is a multi-level connection design that
repeatedly connects multi-level features of the encoder network to the decoder
network. Multi-level connections encourage the decoding process to use the
feature information of all levels. Channel attention is considered in
multi-level connections to learn which level of features is important in the
decoding process of the current level. The second main idea is a wide regional
non-local block (WRNL). As rain streaks primarily exhibit a vertical
distribution, we divide the grid of the image into horizontally-wide patches
and apply a non-local operation to each region to explore the rich rain-free
background information. Experimental results on both synthetic and real-world
rainy datasets demonstrate that the proposed model significantly outperforms
existing state-of-the-art models. Furthermore, the results of the joint
deraining and segmentation experiment prove that our model contributes
effectively to other vision tasks
-Poisson surface reconstruction in curl-free flow from point clouds
The aim of this paper is the reconstruction of a smooth surface from an
unorganized point cloud sampled by a closed surface, with the preservation of
geometric shapes, without any further information other than the point cloud.
Implicit neural representations (INRs) have recently emerged as a promising
approach to surface reconstruction. However, the reconstruction quality of
existing methods relies on ground truth implicit function values or surface
normal vectors. In this paper, we show that proper supervision of partial
differential equations and fundamental properties of differential vector fields
are sufficient to robustly reconstruct high-quality surfaces. We cast the
-Poisson equation to learn a signed distance function (SDF) and the
reconstructed surface is implicitly represented by the zero-level set of the
SDF. For efficient training, we develop a variable splitting structure by
introducing a gradient of the SDF as an auxiliary variable and impose the
-Poisson equation directly on the auxiliary variable as a hard constraint.
Based on the curl-free property of the gradient field, we impose a curl-free
constraint on the auxiliary variable, which leads to a more faithful
reconstruction. Experiments on standard benchmark datasets show that the
proposed INR provides a superior and robust reconstruction. The code is
available at \url{https://github.com/Yebbi/PINC}.Comment: 21 pages, accepted for Advances in Neural Information Processing
Systems, 202
Finding the global semantic representation in GAN through Frechet Mean
The ideally disentangled latent space in GAN involves the global
representation of latent space with semantic attribute coordinates. In other
words, considering that this disentangled latent space is a vector space, there
exists the global semantic basis where each basis component describes one
attribute of generated images. In this paper, we propose an unsupervised method
for finding this global semantic basis in the intermediate latent space in
GANs. This semantic basis represents sample-independent meaningful
perturbations that change the same semantic attribute of an image on the entire
latent space. The proposed global basis, called Fr\'echet basis, is derived by
introducing Fr\'echet mean to the local semantic perturbations in a latent
space. Fr\'echet basis is discovered in two stages. First, the global semantic
subspace is discovered by the Fr\'echet mean in the Grassmannian manifold of
the local semantic subspaces. Second, Fr\'echet basis is found by optimizing a
basis of the semantic subspace via the Fr\'echet mean in the Special Orthogonal
Group. Experimental results demonstrate that Fr\'echet basis provides better
semantic factorization and robustness compared to the previous methods.
Moreover, we suggest the basis refinement scheme for the previous methods. The
quantitative experiments show that the refined basis achieves better semantic
factorization while constrained on the same semantic subspace given by the
previous method.Comment: 25 pages, 21 figure
Analyzing the Latent Space of GAN through Local Dimension Estimation
The impressive success of style-based GANs (StyleGANs) in high-fidelity image
synthesis has motivated research to understand the semantic properties of their
latent spaces. In this paper, we approach this problem through a geometric
analysis of latent spaces as a manifold. In particular, we propose a local
dimension estimation algorithm for arbitrary intermediate layers in a
pre-trained GAN model. The estimated local dimension is interpreted as the
number of possible semantic variations from this latent variable. Moreover,
this intrinsic dimension estimation enables unsupervised evaluation of
disentanglement for a latent space. Our proposed metric, called Distortion,
measures an inconsistency of intrinsic tangent space on the learned latent
space. Distortion is purely geometric and does not require any additional
attribute information. Nevertheless, Distortion shows a high correlation with
the global-basis-compatibility and supervised disentanglement score. Our work
is the first step towards selecting the most disentangled latent space among
various latent spaces in a GAN without attribute labels
- β¦