68 research outputs found

    Binary Radiance Fields

    Full text link
    In this paper, we propose binary radiance fields (BiRF), a storage-efficient radiance field representation employing binary feature encoding that encodes local features using binary encoding parameters in a format of either +1+1 or 1-1. This binarization strategy lets us represent the feature grid with highly compact feature encoding and a dramatic reduction in storage size. Furthermore, our 2D-3D hybrid feature grid design enhances the compactness of feature encoding as the 3D grid includes main components while 2D grids capture details. In our experiments, binary radiance field representation successfully outperforms the reconstruction performance of state-of-the-art (SOTA) efficient radiance field models with lower storage allocation. In particular, our model achieves impressive results in static scene reconstruction, with a PSNR of 31.53 dB for Synthetic-NeRF scenes, 34.26 dB for Synthetic-NSVF scenes, 28.02 dB for Tanks and Temples scenes while only utilizing 0.7 MB, 0.8 MB, and 0.8 MB of storage space, respectively. We hope the proposed binary radiance field representation will make radiance fields more accessible without a storage bottleneck.Comment: 21 pages, 12 Figures, and 11 Table

    Instance-Aware Image Completion

    Full text link
    Image completion is a task that aims to fill in the missing region of a masked image with plausible contents. However, existing image completion methods tend to fill in the missing region with the surrounding texture instead of hallucinating a visual instance that is suitable in accordance with the context of the scene. In this work, we propose a novel image completion model, dubbed ImComplete, that hallucinates the missing instance that harmonizes well with - and thus preserves - the original context. ImComplete first adopts a transformer architecture that considers the visible instances and the location of the missing region. Then, ImComplete completes the semantic segmentation masks within the missing region, providing pixel-level semantic and structural guidance. Finally, the image synthesis blocks generate photo-realistic content. We perform a comprehensive evaluation of the results in terms of visual quality (LPIPS and FID) and contextual preservation scores (CLIPscore and object detection accuracy) with COCO-panoptic and Visual Genome datasets. Experimental results show the superiority of ImComplete on various natural images

    Learning Debiased Classifier with Biased Committee

    Full text link
    Neural networks are prone to be biased towards spurious correlations between classes and latent attributes exhibited in a major portion of training data, which ruins their generalization capability. We propose a new method for training debiased classifiers with no spurious attribute label. The key idea is to employ a committee of classifiers as an auxiliary module that identifies bias-conflicting data, i.e., data without spurious correlation, and assigns large weights to them when training the main classifier. The committee is learned as a bootstrapped ensemble so that a majority of its classifiers are biased as well as being diverse, and intentionally fail to predict classes of bias-conflicting data accordingly. The consensus within the committee on prediction difficulty thus provides a reliable cue for identifying and weighting bias-conflicting data. Moreover, the committee is also trained with knowledge transferred from the main classifier so that it gradually becomes debiased along with the main classifier and emphasizes more difficult data as training progresses. On five real-world datasets, our method outperforms prior arts using no spurious attribute label like ours and even surpasses those relying on bias labels occasionally.Comment: Conference on Neural Information Processing Systems (NeurIPS), New Orleans, 202

    Scaling up GANs for Text-to-Image Synthesis

    Full text link
    The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that na\"Ively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.Comment: CVPR 2023. Project webpage at https://mingukkang.github.io/GigaGAN

    Ordered mesoporous porphyrinic carbons with very high electrocatalytic activity for the oxygen reduction reaction

    Get PDF
    The high cost of the platinum-based cathode catalysts for the oxygen reduction reaction (ORR) has impeded the widespread application of polymer electrolyte fuel cells. We report on a new family of non-precious metal catalysts based on ordered mesoporous porphyrinic carbons (M-OMPC; M = Fe, Co, or FeCo) with high surface areas and tunable pore structures, which were prepared by nanocasting mesoporous silica templates with metalloporphyrin precursors. The FeCo-OMPC catalyst exhibited an excellent ORR activity in an acidic medium, higher than other non-precious metal catalysts. It showed higher kinetic current at 0.9a�...V than Pt/C catalysts, as well as superior long-term durability and MeOH-tolerance. Density functional theory calculations in combination with extended X-ray absorption fine structure analysis revealed a weakening of the interaction between oxygen atom and FeCo-OMPC compared to Pt/C. This effect and high surface area of FeCo-OMPC appear responsible for its significantly high ORR activity.open251

    STAD: Stable Video Depth Estimation

    No full text
    1

    Instance-wise Occlusion and Depth Orders in Natural Scenes

    No full text
    In this paper, we introduce a new dataset, named InstaOrder, that can be used to understand the geometrical relationships of instances in an image. The dataset consists of 2.9M annotations of geometric orderings for class-labeled instances in 101K natural scenes. The scenes were annotated by 3,659 crowd-workers regarding (1) occlusion order that identifies occluder/occludee and (2) depth order that describes ordinal relations that consider relative distance from the camera. The dataset provides joint annotation of two kinds of orderings for the same instances, and we discover that the occlusion order and depth order are complementary. We also introduce a geometric order prediction network called InstaOrderNet, which is superior to state-of-the-art approaches. Moreover, we propose a dense depth prediction network called InstaDepthNet that uses auxiliary geometric order loss to boost the accuracy of the state-of-the-art depth prediction approach, MiDaS [54].1

    ContraGAN: Contrastive Learning for Conditional Image Generation

    No full text
    1
    corecore