80 research outputs found
Explanation on Pretraining Bias of Finetuned Vision Transformer
As the number of fine tuning of pretrained models increased, understanding
the bias of pretrained model is essential. However, there is little tool to
analyse transformer architecture and the interpretation of the attention maps
is still challenging. To tackle the interpretability, we propose
Input-Attribution and Attention Score Vector (IAV) which measures the
similarity between attention map and input-attribution and shows the general
trend of interpretable attention patterns. We empirically explain the
pretraining bias of supervised and unsupervised pretrained ViT models, and show
that each head in ViT has a specific range of agreement on the decision of the
classification. We show that generalization, robustness and entropy of
attention maps are not property of pretraining types. On the other hand, IAV
trend can separate the pretraining types
Binary Radiance Fields
In this paper, we propose binary radiance fields (BiRF), a storage-efficient
radiance field representation employing binary feature encoding that encodes
local features using binary encoding parameters in a format of either or
. This binarization strategy lets us represent the feature grid with highly
compact feature encoding and a dramatic reduction in storage size. Furthermore,
our 2D-3D hybrid feature grid design enhances the compactness of feature
encoding as the 3D grid includes main components while 2D grids capture
details. In our experiments, binary radiance field representation successfully
outperforms the reconstruction performance of state-of-the-art (SOTA) efficient
radiance field models with lower storage allocation. In particular, our model
achieves impressive results in static scene reconstruction, with a PSNR of
31.53 dB for Synthetic-NeRF scenes, 34.26 dB for Synthetic-NSVF scenes, 28.02
dB for Tanks and Temples scenes while only utilizing 0.7 MB, 0.8 MB, and 0.8 MB
of storage space, respectively. We hope the proposed binary radiance field
representation will make radiance fields more accessible without a storage
bottleneck.Comment: 21 pages, 12 Figures, and 11 Table
StudioGAN: A Taxonomy and Benchmark of GANs for Image Synthesis
Generative Adversarial Network (GAN) is one of the state-of-the-art
generative models for realistic image synthesis. While training and evaluating
GAN becomes increasingly important, the current GAN research ecosystem does not
provide reliable benchmarks for which the evaluation is conducted consistently
and fairly. Furthermore, because there are few validated GAN implementations,
researchers devote considerable time to reproducing baselines. We study the
taxonomy of GAN approaches and present a new open-source library named
StudioGAN. StudioGAN supports 7 GAN architectures, 9 conditioning methods, 4
adversarial losses, 12 regularization modules, 3 differentiable augmentations,
7 evaluation metrics, and 5 evaluation backbones. With our training and
evaluation protocol, we present a large-scale benchmark using various datasets
(CIFAR10, ImageNet, AFHQv2, FFHQ, and Baby/Papa/Granpa-ImageNet) and 3
different evaluation backbones (InceptionV3, SwAV, and Swin Transformer).
Unlike other benchmarks used in the GAN community, we train representative
GANs, including BigGAN and StyleGAN series in a unified training pipeline and
quantify generation performance with 7 evaluation metrics. The benchmark
evaluates other cutting-edge generative models (e.g., StyleGAN-XL, ADM,
MaskGIT, and RQ-Transformer). StudioGAN provides GAN implementations, training,
and evaluation scripts with the pre-trained weights. StudioGAN is available at
https://github.com/POSTECH-CVLab/PyTorch-StudioGAN.Comment: 32 pages, IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI, 2023
PointMixer: MLP-Mixer for Point Cloud Understanding
MLP-Mixer has newly appeared as a new challenger against the realm of CNNs
and transformer. Despite its simplicity compared to transformer, the concept of
channel-mixing MLPs and token-mixing MLPs achieves noticeable performance in
visual recognition tasks. Unlike images, point clouds are inherently sparse,
unordered and irregular, which limits the direct use of MLP-Mixer for point
cloud understanding. In this paper, we propose PointMixer, a universal point
set operator that facilitates information sharing among unstructured 3D points.
By simply replacing token-mixing MLPs with a softmax function, PointMixer can
"mix" features within/between point sets. By doing so, PointMixer can be
broadly used in the network as inter-set mixing, intra-set mixing, and pyramid
mixing. Extensive experiments show the competitive or superior performance of
PointMixer in semantic segmentation, classification, and point reconstruction
against transformer-based methods.Comment: Accepted to ECCV 202
Instance-Aware Image Completion
Image completion is a task that aims to fill in the missing region of a
masked image with plausible contents. However, existing image completion
methods tend to fill in the missing region with the surrounding texture instead
of hallucinating a visual instance that is suitable in accordance with the
context of the scene. In this work, we propose a novel image completion model,
dubbed ImComplete, that hallucinates the missing instance that harmonizes well
with - and thus preserves - the original context. ImComplete first adopts a
transformer architecture that considers the visible instances and the location
of the missing region. Then, ImComplete completes the semantic segmentation
masks within the missing region, providing pixel-level semantic and structural
guidance. Finally, the image synthesis blocks generate photo-realistic content.
We perform a comprehensive evaluation of the results in terms of visual quality
(LPIPS and FID) and contextual preservation scores (CLIPscore and object
detection accuracy) with COCO-panoptic and Visual Genome datasets. Experimental
results show the superiority of ImComplete on various natural images
Learning Debiased Classifier with Biased Committee
Neural networks are prone to be biased towards spurious correlations between
classes and latent attributes exhibited in a major portion of training data,
which ruins their generalization capability. We propose a new method for
training debiased classifiers with no spurious attribute label. The key idea is
to employ a committee of classifiers as an auxiliary module that identifies
bias-conflicting data, i.e., data without spurious correlation, and assigns
large weights to them when training the main classifier. The committee is
learned as a bootstrapped ensemble so that a majority of its classifiers are
biased as well as being diverse, and intentionally fail to predict classes of
bias-conflicting data accordingly. The consensus within the committee on
prediction difficulty thus provides a reliable cue for identifying and
weighting bias-conflicting data. Moreover, the committee is also trained with
knowledge transferred from the main classifier so that it gradually becomes
debiased along with the main classifier and emphasizes more difficult data as
training progresses. On five real-world datasets, our method outperforms prior
arts using no spurious attribute label like ours and even surpasses those
relying on bias labels occasionally.Comment: Conference on Neural Information Processing Systems (NeurIPS), New
Orleans, 202
Scaling up GANs for Text-to-Image Synthesis
The recent success of text-to-image synthesis has taken the world by storm
and captured the general public's imagination. From a technical standpoint, it
also marked a drastic change in the favored architecture to design generative
image models. GANs used to be the de facto choice, with techniques like
StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new
standard for large-scale generative models overnight. This rapid shift raises a
fundamental question: can we scale up GANs to benefit from large datasets like
LAION? We find that na\"Ively increasing the capacity of the StyleGAN
architecture quickly becomes unstable. We introduce GigaGAN, a new GAN
architecture that far exceeds this limit, demonstrating GANs as a viable option
for text-to-image synthesis. GigaGAN offers three major advantages. First, it
is orders of magnitude faster at inference time, taking only 0.13 seconds to
synthesize a 512px image. Second, it can synthesize high-resolution images, for
example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various
latent space editing applications such as latent interpolation, style mixing,
and vector arithmetic operations.Comment: CVPR 2023. Project webpage at https://mingukkang.github.io/GigaGAN
Ordered mesoporous porphyrinic carbons with very high electrocatalytic activity for the oxygen reduction reaction
The high cost of the platinum-based cathode catalysts for the oxygen reduction reaction (ORR) has impeded the widespread application of polymer electrolyte fuel cells. We report on a new family of non-precious metal catalysts based on ordered mesoporous porphyrinic carbons (M-OMPC; M = Fe, Co, or FeCo) with high surface areas and tunable pore structures, which were prepared by nanocasting mesoporous silica templates with metalloporphyrin precursors. The FeCo-OMPC catalyst exhibited an excellent ORR activity in an acidic medium, higher than other non-precious metal catalysts. It showed higher kinetic current at 0.9a�...V than Pt/C catalysts, as well as superior long-term durability and MeOH-tolerance. Density functional theory calculations in combination with extended X-ray absorption fine structure analysis revealed a weakening of the interaction between oxygen atom and FeCo-OMPC compared to Pt/C. This effect and high surface area of FeCo-OMPC appear responsible for its significantly high ORR activity.open251
- …