2,974 research outputs found

    Hierarchy Composition GAN for High-fidelity Image Synthesis

    Full text link
    Despite the rapid progress of generative adversarial networks (GANs) in image synthesis in recent years, the existing image synthesis approaches work in either geometry domain or appearance domain alone which often introduces various synthesis artifacts. This paper presents an innovative Hierarchical Composition GAN (HIC-GAN) that incorporates image synthesis in geometry and appearance domains into an end-to-end trainable network and achieves superior synthesis realism in both domains simultaneously. We design an innovative hierarchical composition mechanism that is capable of learning realistic composition geometry and handling occlusions while multiple foreground objects are involved in image composition. In addition, we introduce a novel attention mask mechanism that guides to adapt the appearance of foreground objects which also helps to provide better training reference for learning in geometry domain. Extensive experiments on scene text image synthesis, portrait editing and indoor rendering tasks show that the proposed HIC-GAN achieves superior synthesis performance qualitatively and quantitatively.Comment: 11 pages, 8 figure

    Pedestrian Attribute Recognition: A Survey

    Full text link
    Recognizing pedestrian attributes is an important task in computer vision community due to it plays an important role in video surveillance. Many algorithms has been proposed to handle this task. The goal of this paper is to review existing works using traditional methods or based on deep learning networks. Firstly, we introduce the background of pedestrian attributes recognition (PAR, for short), including the fundamental concepts of pedestrian attributes and corresponding challenges. Secondly, we introduce existing benchmarks, including popular datasets and evaluation criterion. Thirdly, we analyse the concept of multi-task learning and multi-label learning, and also explain the relations between these two learning algorithms and pedestrian attribute recognition. We also review some popular network architectures which have widely applied in the deep learning community. Fourthly, we analyse popular solutions for this task, such as attributes group, part-based, \emph{etc}. Fifthly, we shown some applications which takes pedestrian attributes into consideration and achieve better performance. Finally, we summarized this paper and give several possible research directions for pedestrian attributes recognition. The project page of this paper can be found from the following website: \url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey: https://sites.google.com/view/ahu-pedestrianattributes

    Neural Wireframe Renderer: Learning Wireframe to Image Translations

    Full text link
    In architecture and computer-aided design, wireframes (i.e., line-based models) are widely used as basic 3D models for design evaluation and fast design iterations. However, unlike a full design file, a wireframe model lacks critical information, such as detailed shape, texture, and materials, needed by a conventional renderer to produce 2D renderings of the objects or scenes. In this paper, we bridge the information gap by generating photo-realistic rendering of indoor scenes from wireframe models in an image translation framework. While existing image synthesis methods can generate visually pleasing images for common objects such as faces and birds, these methods do not explicitly model and preserve essential structural constraints in a wireframe model, such as junctions, parallel lines, and planar surfaces. To this end, we propose a novel model based on a structure-appearance joint representation learned from both images and wireframes. In our model, structural constraints are explicitly enforced by learning a joint representation in a shared encoder network that must support the generation of both images and wireframes. Experiments on a wireframe-scene dataset show that our wireframe-to-image translation model significantly outperforms the state-of-the-art methods in both visual quality and structural integrity of generated images.Comment: ECCV 202

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure

    High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

    Full text link
    We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled a variety of applications, but the results are often limited to low-resolution and still far from realistic. In this work, we generate 2048x1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features. First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category. Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively. Human opinion studies demonstrate that our method significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.Comment: v2: CVPR camera ready, adding more results for edge-to-photo example

    SC-NeRF: Self-Correcting Neural Radiance Field with Sparse Views

    Full text link
    In recent studies, the generalization of neural radiance fields for novel view synthesis task has been widely explored. However, existing methods are limited to objects and indoor scenes. In this work, we extend the generalization task to outdoor scenes, trained only on object-level datasets. This approach presents two challenges. Firstly, the significant distributional shift between training and testing scenes leads to black artifacts in rendering results. Secondly, viewpoint changes in outdoor scenes cause ghosting or missing regions in rendered images. To address these challenges, we propose a geometric correction module and an appearance correction module based on multi-head attention mechanisms. We normalize rendered depth and combine it with light direction as query in the attention mechanism. Our network effectively corrects varying scene structures and geometric features in outdoor scenes, generalizing well from object-level to unseen outdoor scenes. Additionally, we use appearance correction module to correct appearance features, preventing rendering artifacts like blank borders and ghosting due to viewpoint changes. By combining these modules, our approach successfully tackles the challenges of outdoor scene generalization, producing high-quality rendering results. When evaluated on four datasets (Blender, DTU, LLFF, Spaces), our network outperforms previous methods. Notably, compared to MVSNeRF, our network improves average PSNR from 19.369 to 25.989, SSIM from 0.838 to 0.889, and reduces LPIPS from 0.265 to 0.224 on Spaces outdoor scenes

    Semantic Visual Localization

    Full text link
    Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, e.g., in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes
    corecore