2,974 research outputs found
Hierarchy Composition GAN for High-fidelity Image Synthesis
Despite the rapid progress of generative adversarial networks (GANs) in image
synthesis in recent years, the existing image synthesis approaches work in
either geometry domain or appearance domain alone which often introduces
various synthesis artifacts. This paper presents an innovative Hierarchical
Composition GAN (HIC-GAN) that incorporates image synthesis in geometry and
appearance domains into an end-to-end trainable network and achieves superior
synthesis realism in both domains simultaneously. We design an innovative
hierarchical composition mechanism that is capable of learning realistic
composition geometry and handling occlusions while multiple foreground objects
are involved in image composition. In addition, we introduce a novel attention
mask mechanism that guides to adapt the appearance of foreground objects which
also helps to provide better training reference for learning in geometry
domain. Extensive experiments on scene text image synthesis, portrait editing
and indoor rendering tasks show that the proposed HIC-GAN achieves superior
synthesis performance qualitatively and quantitatively.Comment: 11 pages, 8 figure
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
Neural Wireframe Renderer: Learning Wireframe to Image Translations
In architecture and computer-aided design, wireframes (i.e., line-based
models) are widely used as basic 3D models for design evaluation and fast
design iterations. However, unlike a full design file, a wireframe model lacks
critical information, such as detailed shape, texture, and materials, needed by
a conventional renderer to produce 2D renderings of the objects or scenes. In
this paper, we bridge the information gap by generating photo-realistic
rendering of indoor scenes from wireframe models in an image translation
framework. While existing image synthesis methods can generate visually
pleasing images for common objects such as faces and birds, these methods do
not explicitly model and preserve essential structural constraints in a
wireframe model, such as junctions, parallel lines, and planar surfaces. To
this end, we propose a novel model based on a structure-appearance joint
representation learned from both images and wireframes. In our model,
structural constraints are explicitly enforced by learning a joint
representation in a shared encoder network that must support the generation of
both images and wireframes. Experiments on a wireframe-scene dataset show that
our wireframe-to-image translation model significantly outperforms the
state-of-the-art methods in both visual quality and structural integrity of
generated images.Comment: ECCV 202
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
We present a new method for synthesizing high-resolution photo-realistic
images from semantic label maps using conditional generative adversarial
networks (conditional GANs). Conditional GANs have enabled a variety of
applications, but the results are often limited to low-resolution and still far
from realistic. In this work, we generate 2048x1024 visually appealing results
with a novel adversarial loss, as well as new multi-scale generator and
discriminator architectures. Furthermore, we extend our framework to
interactive visual manipulation with two additional features. First, we
incorporate object instance segmentation information, which enables object
manipulations such as removing/adding objects and changing the object category.
Second, we propose a method to generate diverse results given the same input,
allowing users to edit the object appearance interactively. Human opinion
studies demonstrate that our method significantly outperforms existing methods,
advancing both the quality and the resolution of deep image synthesis and
editing.Comment: v2: CVPR camera ready, adding more results for edge-to-photo example
SC-NeRF: Self-Correcting Neural Radiance Field with Sparse Views
In recent studies, the generalization of neural radiance fields for novel
view synthesis task has been widely explored. However, existing methods are
limited to objects and indoor scenes. In this work, we extend the
generalization task to outdoor scenes, trained only on object-level datasets.
This approach presents two challenges. Firstly, the significant distributional
shift between training and testing scenes leads to black artifacts in rendering
results. Secondly, viewpoint changes in outdoor scenes cause ghosting or
missing regions in rendered images. To address these challenges, we propose a
geometric correction module and an appearance correction module based on
multi-head attention mechanisms. We normalize rendered depth and combine it
with light direction as query in the attention mechanism. Our network
effectively corrects varying scene structures and geometric features in outdoor
scenes, generalizing well from object-level to unseen outdoor scenes.
Additionally, we use appearance correction module to correct appearance
features, preventing rendering artifacts like blank borders and ghosting due to
viewpoint changes. By combining these modules, our approach successfully
tackles the challenges of outdoor scene generalization, producing high-quality
rendering results. When evaluated on four datasets (Blender, DTU, LLFF,
Spaces), our network outperforms previous methods. Notably, compared to
MVSNeRF, our network improves average PSNR from 19.369 to 25.989, SSIM from
0.838 to 0.889, and reduces LPIPS from 0.265 to 0.224 on Spaces outdoor scenes
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
- …