74 research outputs found
ReFu: Refine and Fuse the Unobserved View for Detail-Preserving Single-Image 3D Human Reconstruction
Single-image 3D human reconstruction aims to reconstruct the 3D textured
surface of the human body given a single image. While implicit function-based
methods recently achieved reasonable reconstruction performance, they still
bear limitations showing degraded quality in both surface geometry and texture
from an unobserved view. In response, to generate a realistic textured surface,
we propose ReFu, a coarse-to-fine approach that refines the projected backside
view image and fuses the refined image to predict the final human body. To
suppress the diffused occupancy that causes noise in projection images and
reconstructed meshes, we propose to train occupancy probability by
simultaneously utilizing 2D and 3D supervisions with occupancy-based volume
rendering. We also introduce a refinement architecture that generates
detail-preserving backside-view images with front-to-back warping. Extensive
experiments demonstrate that our method achieves state-of-the-art performance
in 3D human reconstruction from a single image, showing enhanced geometry and
texture quality from an unobserved view.Comment: Accepted at ACM MM 202
ST-RAP: A Spatio-Temporal Framework for Real Estate Appraisal
In this paper, we introduce ST-RAP, a novel Spatio-Temporal framework for
Real estate APpraisal. ST-RAP employs a hierarchical architecture with a
heterogeneous graph neural network to encapsulate temporal dynamics and spatial
relationships simultaneously. Through comprehensive experiments on a
large-scale real estate dataset, ST-RAP outperforms previous methods,
demonstrating the significant benefits of integrating spatial and temporal
aspects in real estate appraisal. Our code and dataset are available at
https://github.com/dojeon-ai/STRAP.Comment: Accepted to CIKM'2
Reference-based Image Composition with Sketch via Structure-aware Diffusion Model
Recent remarkable improvements in large-scale text-to-image generative models
have shown promising results in generating high-fidelity images. To further
enhance editability and enable fine-grained generation, we introduce a
multi-input-conditioned image composition model that incorporates a sketch as a
novel modal, alongside a reference image. Thanks to the edge-level
controllability using sketches, our method enables a user to edit or complete
an image sub-part with a desired structure (i.e., sketch) and content (i.e.,
reference image). Our framework fine-tunes a pre-trained diffusion model to
complete missing regions using the reference image while maintaining sketch
guidance. Albeit simple, this leads to wide opportunities to fulfill user needs
for obtaining the in-demand images. Through extensive experiments, we
demonstrate that our proposed method offers unique use cases for image
manipulation, enabling user-driven modifications of arbitrary scenes.Comment: 7 pages; Code URL: https://github.com/kangyeolk/Paint-by-Sketc
Do “Digital” Firms Live Longer as Fields Converge? A Survival Analysis of the S&P 500
We report on a study of firm longevity as we move from the industrial age to the digital age. Through a survival analysis of S&P 500 firms from 1965 to 2016, we find that firms survive for a decreasing duration over the time period of our sample. This indicates that the pace of innovation is increasing as we transition to the digital age. Further, we find that this duration is longer for non-digital firms than digital firms, indicating a generally fiercer competitive landscape for digital firms. Finally, we find that this difference between digital and non-digital firms largely disappears after the 1990s when the digital age has firmly taken root. This study thus provides the first large-scale evidence for digital field convergence – a term we use to describe the blurring of industrial distinctions in the digital age as all firms are becoming digital firms
PixelHuman: Animatable Neural Radiance Fields from Few Images
In this paper, we propose PixelHuman, a novel human rendering model that
generates animatable human scenes from a few images of a person with unseen
identity, views, and poses. Previous work have demonstrated reasonable
performance in novel view and pose synthesis, but they rely on a large number
of images to train and are trained per scene from videos, which requires
significant amount of time to produce animatable scenes from unseen human
images. Our method differs from existing methods in that it can generalize to
any input image for animatable human synthesis. Given a random pose sequence,
our method synthesizes each target scene using a neural radiance field that is
conditioned on a canonical representation and pose-aware pixel-aligned
features, both of which can be obtained through deformation fields learned in a
data-driven manner. Our experiments show that our method achieves
state-of-the-art performance in multiview and novel pose synthesis from
few-shot images.Comment: 8 page
iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer
Point-interactive image colorization aims to colorize grayscale images when a
user provides the colors for specific locations. It is essential for
point-interactive colorization methods to appropriately propagate user-provided
colors (i.e., user hints) in the entire image to obtain a reasonably colorized
image with minimal user effort. However, existing approaches often produce
partially colorized results due to the inefficient design of stacking
convolutional layers to propagate hints to distant relevant regions. To address
this problem, we present iColoriT, a novel point-interactive colorization
Vision Transformer capable of propagating user hints to relevant regions,
leveraging the global receptive field of Transformers. The self-attention
mechanism of Transformers enables iColoriT to selectively colorize relevant
regions with only a few local hints. Our approach colorizes images in real-time
by utilizing pixel shuffling, an efficient upsampling technique that replaces
the decoder architecture. Also, in order to mitigate the artifacts caused by
pixel shuffling with large upsampling ratios, we present the local stabilizing
layer. Extensive quantitative and qualitative results demonstrate that our
approach highly outperforms existing methods for point-interactive
colorization, producing accurately colorized images with a user's minimal
effort. Official codes are available at
https://pmh9960.github.io/research/iColoriTComment: Accepted to WACV 202
On the Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement Learning
Recently, unsupervised representation learning (URL) has improved the sample
efficiency of Reinforcement Learning (RL) by pretraining a model from a large
unlabeled dataset. The underlying principle of these methods is to learn
temporally predictive representations by predicting future states in the latent
space. However, an important challenge of this approach is the representational
collapse, where the subspace of the latent representations collapses into a
low-dimensional manifold. To address this issue, we propose a novel URL
framework that causally predicts future states while increasing the dimension
of the latent manifold by decorrelating the features in the latent space.
Through extensive empirical studies, we demonstrate that our framework
effectively learns predictive representations without collapse, which
significantly improves the sample efficiency of state-of-the-art URL methods on
the Atari 100k benchmark. The code is available at
https://github.com/dojeon-ai/SimTPR.Comment: Accepted to ICML 202
RobustSwap: A Simple yet Robust Face Swapping Model against Attribute Leakage
Face swapping aims at injecting a source image's identity (i.e., facial
features) into a target image, while strictly preserving the target's
attributes, which are irrelevant to identity. However, we observed that
previous approaches still suffer from source attribute leakage, where the
source image's attributes interfere with the target image's. In this paper, we
analyze the latent space of StyleGAN and find the adequate combination of the
latents geared for face swapping task. Based on the findings, we develop a
simple yet robust face swapping model, RobustSwap, which is resistant to the
potential source attribute leakage. Moreover, we exploit the coordination of
3DMM's implicit and explicit information as a guidance to incorporate the
structure of the source image and the precise pose of the target image. Despite
our method solely utilizing an image dataset without identity labels for
training, our model has the capability to generate high-fidelity and temporally
consistent videos. Through extensive qualitative and quantitative evaluations,
we demonstrate that our method shows significant improvements compared with the
previous face swapping models in synthesizing both images and videos. Project
page is available at https://robustswap.github.io/Comment: 21 page
- …