40 research outputs found
Accelerated materials language processing enabled by GPT
Materials language processing (MLP) is one of the key facilitators of
materials science research, as it enables the extraction of structured
information from massive materials science literature. Prior works suggested
high-performance MLP models for text classification, named entity recognition
(NER), and extractive question answering (QA), which require complex model
architecture, exhaustive fine-tuning and a large number of human-labelled
datasets. In this study, we develop generative pretrained transformer
(GPT)-enabled pipelines where the complex architectures of prior MLP models are
replaced with strategic designs of prompt engineering. First, we develop a
GPT-enabled document classification method for screening relevant documents,
achieving comparable accuracy and reliability compared to prior models, with
only small dataset. Secondly, for NER task, we design an entity-centric
prompts, and learning few-shot of them improved the performance on most of
entities in three open datasets. Finally, we develop an GPT-enabled extractive
QA model, which provides improved performance and shows the possibility of
automatically correcting annotations. While our findings confirm the potential
of GPT-enabled MLP models as well as their value in terms of reliability and
practicability, our scientific methods and systematic approach are applicable
to any materials science domain to accelerate the information extraction of
scientific literature
Analyzing and Improving Optimal-Transport-based Adversarial Networks
Optimal Transport (OT) problem aims to find a transport plan that bridges two
distributions while minimizing a given cost function. OT theory has been widely
utilized in generative modeling. In the beginning, OT distance has been used as
a measure for assessing the distance between data and generated distributions.
Recently, OT transport map between data and prior distributions has been
utilized as a generative model. These OT-based generative models share a
similar adversarial training objective. In this paper, we begin by unifying
these OT-based adversarial methods within a single framework. Then, we
elucidate the role of each component in training dynamics through a
comprehensive analysis of this unified framework. Moreover, we suggest a simple
but novel method that improves the previously best-performing OT-based model.
Intuitively, our approach conducts a gradual refinement of the generated
distribution, progressively aligning it with the data distribution. Our
approach achieves a FID score of 2.51 on CIFAR-10 and 5.99 on CelebA-HQ-256,
outperforming unified OT-based adversarial approaches.Comment: 27 pages, 17 figure
Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport
Optimal Transport (OT) problem investigates a transport map that bridges two
distributions while minimizing a given cost function. In this regard, OT
between tractable prior distribution and data has been utilized for generative
modeling tasks. However, OT-based methods are susceptible to outliers and face
optimization challenges during training. In this paper, we propose a novel
generative model based on the semi-dual formulation of Unbalanced Optimal
Transport (UOT). Unlike OT, UOT relaxes the hard constraint on distribution
matching. This approach provides better robustness against outliers, stability
during training, and faster convergence. We validate these properties
empirically through experiments. Moreover, we study the theoretical upper-bound
of divergence between distributions in UOT. Our model outperforms existing
OT-based generative models, achieving FID scores of 2.97 on CIFAR-10 and 5.80
on CelebA-HQ-256.Comment: 23 pages, 15 figure
Finding the global semantic representation in GAN through Frechet Mean
The ideally disentangled latent space in GAN involves the global
representation of latent space with semantic attribute coordinates. In other
words, considering that this disentangled latent space is a vector space, there
exists the global semantic basis where each basis component describes one
attribute of generated images. In this paper, we propose an unsupervised method
for finding this global semantic basis in the intermediate latent space in
GANs. This semantic basis represents sample-independent meaningful
perturbations that change the same semantic attribute of an image on the entire
latent space. The proposed global basis, called Fr\'echet basis, is derived by
introducing Fr\'echet mean to the local semantic perturbations in a latent
space. Fr\'echet basis is discovered in two stages. First, the global semantic
subspace is discovered by the Fr\'echet mean in the Grassmannian manifold of
the local semantic subspaces. Second, Fr\'echet basis is found by optimizing a
basis of the semantic subspace via the Fr\'echet mean in the Special Orthogonal
Group. Experimental results demonstrate that Fr\'echet basis provides better
semantic factorization and robustness compared to the previous methods.
Moreover, we suggest the basis refinement scheme for the previous methods. The
quantitative experiments show that the refined basis achieves better semantic
factorization while constrained on the same semantic subspace given by the
previous method.Comment: 25 pages, 21 figure
Analyzing the Latent Space of GAN through Local Dimension Estimation
The impressive success of style-based GANs (StyleGANs) in high-fidelity image
synthesis has motivated research to understand the semantic properties of their
latent spaces. In this paper, we approach this problem through a geometric
analysis of latent spaces as a manifold. In particular, we propose a local
dimension estimation algorithm for arbitrary intermediate layers in a
pre-trained GAN model. The estimated local dimension is interpreted as the
number of possible semantic variations from this latent variable. Moreover,
this intrinsic dimension estimation enables unsupervised evaluation of
disentanglement for a latent space. Our proposed metric, called Distortion,
measures an inconsistency of intrinsic tangent space on the learned latent
space. Distortion is purely geometric and does not require any additional
attribute information. Nevertheless, Distortion shows a high correlation with
the global-basis-compatibility and supervised disentanglement score. Our work
is the first step towards selecting the most disentangled latent space among
various latent spaces in a GAN without attribute labels
Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry
Despite the success of diffusion models (DMs), we still lack a thorough
understanding of their latent space. To understand the latent space
, we analyze them from a geometrical perspective.
Specifically, we utilize the pullback metric to find the local latent basis in
and their corresponding local tangent basis in , the
intermediate feature maps of DMs. The discovered latent basis enables
unsupervised image editing capability through latent space traversal. We
investigate the discovered structure from two perspectives. First, we examine
how geometric structure evolves over diffusion timesteps. Through analysis, we
show that 1) the model focuses on low-frequency components early in the
generative process and attunes to high-frequency details later; 2) At early
timesteps, different samples share similar tangent spaces; and 3) The simpler
datasets that DMs trained on, the more consistent the tangent space for each
timestep. Second, we investigate how the geometric structure changes based on
text conditioning in Stable Diffusion. The results show that 1) similar prompts
yield comparable tangent spaces; and 2) the model depends less on text
conditions in later timesteps. To the best of our knowledge, this paper is the
first to present image editing through -space traversal and provide
thorough analyses of the latent structure of DMs
Do Not Escape From the Manifold: Discovering the Local Coordinates on the Latent Space of GANs
The discovery of the disentanglement properties of the latent space in GANs
motivated a lot of research to find the semantically meaningful directions on
it. In this paper, we suggest that the disentanglement property is closely
related to the geometry of the latent space. In this regard, we propose an
unsupervised method for finding the semantic-factorizing directions on the
intermediate latent space of GANs based on the local geometry. Intuitively, our
proposed method, called Local Basis, finds the principal variation of the
latent space in the neighborhood of the base latent variable. Experimental
results show that the local principal variation corresponds to the semantic
factorization and traversing along it provides strong robustness to image
traversal. Moreover, we suggest an explanation for the limited success in
finding the global traversal directions in the latent space, especially W-space
of StyleGAN2. We show that W-space is warped globally by comparing the local
geometry, discovered from Local Basis, through the metric on Grassmannian
Manifold. The global warpage implies that the latent space is not well-aligned
globally and therefore the global traversal directions are bound to show
limited success on it.Comment: 23 pages, 19 figure
Environment-Detection-and-Mapping Algorithm for Autonomous Driving in Rural or Off-Road Environment
Abstract—This paper presents an environment-detection-and-mapping algorithm for autonomous driving that is provided in real time and for both rural and off-road environments. Environment-detection-and-mapping algorithms have been de-signed to consist of two parts: 1) lane, pedestrian-crossing, and speed-bump detection algorithms using cameras and 2) obstacle detection algorithm using LIDARs. The lane detection algorithm returns lane positions using one camera and the vision module “VisLab Embedded Lane Detector (VELD), ” and the pedestrian-crossing and speed-bump detection algorithms return the position of pedestrian crossings and speed bumps. The obstacle detection algorithm organizes data from LIDARs and generates a local obstacle position map. The designed algorithms have been im-plemented on a passenger car using six LIDARs, three cameras, and real-time devices, including personal computers (PCs). Vehicle tests have been conducted, and test results have shown that the vehicle can reach the desired goal with the proposed algorithm. Index Terms—Autonomous driving, lane detection, obstacle de-tection, pedestrian-crossing detection, speed-bump detection. I