68 research outputs found
Binary Radiance Fields
In this paper, we propose binary radiance fields (BiRF), a storage-efficient
radiance field representation employing binary feature encoding that encodes
local features using binary encoding parameters in a format of either or
. This binarization strategy lets us represent the feature grid with highly
compact feature encoding and a dramatic reduction in storage size. Furthermore,
our 2D-3D hybrid feature grid design enhances the compactness of feature
encoding as the 3D grid includes main components while 2D grids capture
details. In our experiments, binary radiance field representation successfully
outperforms the reconstruction performance of state-of-the-art (SOTA) efficient
radiance field models with lower storage allocation. In particular, our model
achieves impressive results in static scene reconstruction, with a PSNR of
31.53 dB for Synthetic-NeRF scenes, 34.26 dB for Synthetic-NSVF scenes, 28.02
dB for Tanks and Temples scenes while only utilizing 0.7 MB, 0.8 MB, and 0.8 MB
of storage space, respectively. We hope the proposed binary radiance field
representation will make radiance fields more accessible without a storage
bottleneck.Comment: 21 pages, 12 Figures, and 11 Table
Instance-Aware Image Completion
Image completion is a task that aims to fill in the missing region of a
masked image with plausible contents. However, existing image completion
methods tend to fill in the missing region with the surrounding texture instead
of hallucinating a visual instance that is suitable in accordance with the
context of the scene. In this work, we propose a novel image completion model,
dubbed ImComplete, that hallucinates the missing instance that harmonizes well
with - and thus preserves - the original context. ImComplete first adopts a
transformer architecture that considers the visible instances and the location
of the missing region. Then, ImComplete completes the semantic segmentation
masks within the missing region, providing pixel-level semantic and structural
guidance. Finally, the image synthesis blocks generate photo-realistic content.
We perform a comprehensive evaluation of the results in terms of visual quality
(LPIPS and FID) and contextual preservation scores (CLIPscore and object
detection accuracy) with COCO-panoptic and Visual Genome datasets. Experimental
results show the superiority of ImComplete on various natural images
Learning Debiased Classifier with Biased Committee
Neural networks are prone to be biased towards spurious correlations between
classes and latent attributes exhibited in a major portion of training data,
which ruins their generalization capability. We propose a new method for
training debiased classifiers with no spurious attribute label. The key idea is
to employ a committee of classifiers as an auxiliary module that identifies
bias-conflicting data, i.e., data without spurious correlation, and assigns
large weights to them when training the main classifier. The committee is
learned as a bootstrapped ensemble so that a majority of its classifiers are
biased as well as being diverse, and intentionally fail to predict classes of
bias-conflicting data accordingly. The consensus within the committee on
prediction difficulty thus provides a reliable cue for identifying and
weighting bias-conflicting data. Moreover, the committee is also trained with
knowledge transferred from the main classifier so that it gradually becomes
debiased along with the main classifier and emphasizes more difficult data as
training progresses. On five real-world datasets, our method outperforms prior
arts using no spurious attribute label like ours and even surpasses those
relying on bias labels occasionally.Comment: Conference on Neural Information Processing Systems (NeurIPS), New
Orleans, 202
Scaling up GANs for Text-to-Image Synthesis
The recent success of text-to-image synthesis has taken the world by storm
and captured the general public's imagination. From a technical standpoint, it
also marked a drastic change in the favored architecture to design generative
image models. GANs used to be the de facto choice, with techniques like
StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new
standard for large-scale generative models overnight. This rapid shift raises a
fundamental question: can we scale up GANs to benefit from large datasets like
LAION? We find that na\"Ively increasing the capacity of the StyleGAN
architecture quickly becomes unstable. We introduce GigaGAN, a new GAN
architecture that far exceeds this limit, demonstrating GANs as a viable option
for text-to-image synthesis. GigaGAN offers three major advantages. First, it
is orders of magnitude faster at inference time, taking only 0.13 seconds to
synthesize a 512px image. Second, it can synthesize high-resolution images, for
example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various
latent space editing applications such as latent interpolation, style mixing,
and vector arithmetic operations.Comment: CVPR 2023. Project webpage at https://mingukkang.github.io/GigaGAN
Ordered mesoporous porphyrinic carbons with very high electrocatalytic activity for the oxygen reduction reaction
The high cost of the platinum-based cathode catalysts for the oxygen reduction reaction (ORR) has impeded the widespread application of polymer electrolyte fuel cells. We report on a new family of non-precious metal catalysts based on ordered mesoporous porphyrinic carbons (M-OMPC; M = Fe, Co, or FeCo) with high surface areas and tunable pore structures, which were prepared by nanocasting mesoporous silica templates with metalloporphyrin precursors. The FeCo-OMPC catalyst exhibited an excellent ORR activity in an acidic medium, higher than other non-precious metal catalysts. It showed higher kinetic current at 0.9a�...V than Pt/C catalysts, as well as superior long-term durability and MeOH-tolerance. Density functional theory calculations in combination with extended X-ray absorption fine structure analysis revealed a weakening of the interaction between oxygen atom and FeCo-OMPC compared to Pt/C. This effect and high surface area of FeCo-OMPC appear responsible for its significantly high ORR activity.open251
Instance-wise Occlusion and Depth Orders in Natural Scenes
In this paper, we introduce a new dataset, named InstaOrder, that can be used to understand the geometrical relationships of instances in an image. The dataset consists of 2.9M annotations of geometric orderings for class-labeled instances in 101K natural scenes. The scenes were annotated by 3,659 crowd-workers regarding (1) occlusion order that identifies occluder/occludee and (2) depth order that describes ordinal relations that consider relative distance from the camera. The dataset provides joint annotation of two kinds of orderings for the same instances, and we discover that the occlusion order and depth order are complementary. We also introduce a geometric order prediction network called InstaOrderNet, which is superior to state-of-the-art approaches. Moreover, we propose a dense depth prediction network called InstaDepthNet that uses auxiliary geometric order loss to boost the accuracy of the state-of-the-art depth prediction approach, MiDaS [54].1
- …