18 research outputs found

    From rule-based to learning-based image-conditional image generation

    Get PDF
    Visual contents, such as movies, animations, computer games, videos and photos, are massively produced and consumed nowadays. Most of these contents are the combination of materials captured from real-world and contents synthesized by computers. Particularly, computer-generated visual contents are increasingly indispensable in modern entertainment and production. The generation of visual contents by computers is typically conditioned on real-world materials, driven by the imagination of designers and artists, or a combination of both. However, creating visual contents manually are both challenging and labor intensive. Therefore, enabling computers to automatically or semi-automatically synthesize needed visual contents becomes essential. Among all these efforts, a stream of research is to generate novel images based on given image priors, e.g., photos and sketches. This research direction is known as image-conditional image generation, which covers a wide range of topics such as image stylization, image completion, image fusion, sketch-to-image generation, and extracting image label maps. In this thesis, a set of novel approaches for image-conditional image generation are presented. The thesis starts with an exemplar-based method for facial image stylization in Chapter 2. This method involves a unified framework for facial image stylization based on a single style exemplar. A two-phase procedure is employed, where the first phase searches a dense and semantic-aware correspondence between the input and the exemplar images, and the second phase conducts edge-preserving texture transfer. While this algorithm has the merit of requiring only a single exemplar, it is constrained to face photos. To perform generalized image-to-image translation, Chapter 3 presents a data-driven and learning-based method. Inspired by the dual learning paradigm designed for natural language translation [115], a novel dual Generative Adversarial Network (DualGAN) mechanism is developed, which enables image translators to be trained from two sets of unlabeled images from two domains. This is followed by another data-driven method in Chapter 4, which learns multiscale manifolds from a set of images and then enables synthesizing novel images that mimic the appearance of the target image dataset. The method is named as Branched Generative Adversarial Network (BranchGAN) and employs a novel training method that enables unconditioned generative adversarial networks (GANs) to learn image manifolds at multiple scales. As a result, we can directly manipulate and even combine latent manifold codes that are associated with specific feature scales. Finally, to provide users more control over image generation results, Chapter 5 discusses an upgraded version of iGAN [126] (iGANHD) that significantly improves the art of manipulating high-resolution images through utilizing the multi-scale manifold learned with BranchGAN

    Fusing Multimedia Data Into Dynamic Virtual Environments

    Get PDF
    In spite of the dramatic growth of virtual and augmented reality (VR and AR) technology, content creation for immersive and dynamic virtual environments remains a significant challenge. In this dissertation, we present our research in fusing multimedia data, including text, photos, panoramas, and multi-view videos, to create rich and compelling virtual environments. First, we present Social Street View, which renders geo-tagged social media in its natural geo-spatial context provided by 360° panoramas. Our system takes into account visual saliency and uses maximal Poisson-disc placement with spatiotemporal filters to render social multimedia in an immersive setting. We also present a novel GPU-driven pipeline for saliency computation in 360° panoramas using spherical harmonics (SH). Our spherical residual model can be applied to virtual cinematography in 360° videos. We further present Geollery, a mixed-reality platform to render an interactive mirrored world in real time with three-dimensional (3D) buildings, user-generated content, and geo-tagged social media. Our user study has identified several use cases for these systems, including immersive social storytelling, experiencing the culture, and crowd-sourced tourism. We next present Video Fields, a web-based interactive system to create, calibrate, and render dynamic videos overlaid on 3D scenes. Our system renders dynamic entities from multiple videos, using early and deferred texture sampling. Video Fields can be used for immersive surveillance in virtual environments. Furthermore, we present VRSurus and ARCrypt projects to explore the applications of gestures recognition, haptic feedback, and visual cryptography for virtual and augmented reality. Finally, we present our work on Montage4D, a real-time system for seamlessly fusing multi-view video textures with dynamic meshes. We use geodesics on meshes with view-dependent rendering to mitigate spatial occlusion seams while maintaining temporal consistency. Our experiments show significant enhancement in rendering quality, especially for salient regions such as faces. We believe that Social Street View, Geollery, Video Fields, and Montage4D will greatly facilitate several applications such as virtual tourism, immersive telepresence, and remote education

    Saliency-guided Enhancement for Volume Visualization

    Full text link

    ARDECO: Automatic Region DEtection and Conversion

    Get PDF
    We present Ardeco, a new algorithm for image abstraction and conversion from bitmap images into vector graphics. Given a bitmap image, our algorithm automatically computes the set of vector primitives and gradients that best approximates the image. In addition, more details can be generated in user-selected important regions, defined from eye-tracking data or from an importance map painted by the user. Our algorithm is based on a new two-level variational parametric segmentation algorithm, minimizing Mumford and Shah's energy and operating on an intermediate triangulation, well adapted to the features of the image

    Scalable visualization of spatial data in 3D terrain

    Get PDF
    Designing visualizations of spatial data in 3D terrain is challenging because various heterogeneous data aspects need to be considered, including the terrain itself, multiple data attributes, and data uncertainty. It is hardly possible to visualize these data at full detail in a single image. Therefore, this thesis devises a scalable visualization approach that focuses on relevant information to be emphasized, while less-relevant information can be attenuated. In this context, a noval concept of visualizing spatial data in 3D terrain and different soft- and hardware solutions are proposed.Die Erstellung von Visualisierungen für räumliche Daten im 3D-Gelände ist schwierig, da viele heterogene Datenaspekte wie das Gelände selbst, die verschiedenen Datenattribute sowie Unsicherheiten bei der Darstellung zu berücksichtigen sind. Im Allgemeinen ist es nicht möglich, diese Datenaspekte gleichzeitig in einer Visualisierung darzustellen. Daher werden in der Arbeit skalierbare Visualisierungsstrategien entwickelt, welche die wichtigen Informationen hervorheben und trotzdem gleichzeitig Kontextinformationen liefern. Hierfür werden neue Systematisierungen und Konzepte vorgestellt

    Saliency-guided Graphics and Visualization

    Get PDF
    In this dissertation, we show how we can use principles of saliency to enhance depiction, manage visual attention, and increase interactivity for 3D graphics and visualization. Current mesh saliency approaches are inspired by low-level human visual cues, but have not yet been validated. Our eye-tracking-based user study shows that the current computational model of mesh saliency can well approximate human eye movements. Artists, illustrators, photographers, and cinematographers have long used the principles of contrast and composition to guide visual attention. We present a visual-saliency-based operator to draw visual attention to selected regions of interest. We have observed that it is more successful at eliciting viewer attention than the traditional Gaussian enhancement operator for visualizing both volume datasets and 3D meshes. Mesh saliency can be measured in various ways. The previous model of saliency computes saliency by identifying the uniqueness of curvature. Another way to identify uniqueness is to look for non-repeating structure in the middle of repeating structure. We have developed a system to detect repeating patterns in 3D point datasets. We introduce the idea of creating vertex and transformation streams that represent large point datasets via their interaction. This dramatically improves arithmetic intensity and addresses the input geometry bandwidth bottleneck for interactive 3D graphics applications. Fast-previewing of time-varing datasets is important for the purpose of summarization and abstraction. We compute the salient frames in molecular dynamics simulations through the subspace analysis of the protein's residue orientations. We first compute an affinity matrix for each frame i of the simulation based on the similarity of the orientation of the protein's backbone residues. Eigenanalysis of the affinity matrix gives us the subspace that best represents the conformation of the current frame i. We use this subspace to represent the frames ahead and behind frame i. The more accurately we can use the subspace of frame i to represent its neighbors, the less salient it is. Taken together, the tools and techniques developed in this dissertation are likely to provide the building blocks for the next generation visual analysis, reasoning, and discovery environments

    Colour videos with depth : acquisition, processing and evaluation

    Get PDF
    The human visual system lets us perceive the world around us in three dimensions by integrating evidence from depth cues into a coherent visual model of the world. The equivalent in computer vision and computer graphics are geometric models, which provide a wealth of information about represented objects, such as depth and surface normals. Videos do not contain this information, but only provide per-pixel colour information. In this dissertation, I hence investigate a combination of videos and geometric models: videos with per-pixel depth (also known as RGBZ videos). I consider the full life cycle of these videos: from their acquisition, via filtering and processing, to stereoscopic display. I propose two approaches to capture videos with depth. The first is a spatiotemporal stereo matching approach based on the dual-cross-bilateral grid – a novel real-time technique derived by accelerating a reformulation of an existing stereo matching approach. This is the basis for an extension which incorporates temporal evidence in real time, resulting in increased temporal coherence of disparity maps – particularly in the presence of image noise. The second acquisition approach is a sensor fusion system which combines data from a noisy, low-resolution time-of-flight camera and a high-resolution colour video camera into a coherent, noise-free video with depth. The system consists of a three-step pipeline that aligns the video streams, efficiently removes and fills invalid and noisy geometry, and finally uses a spatiotemporal filter to increase the spatial resolution of the depth data and strongly reduce depth measurement noise. I show that these videos with depth empower a range of video processing effects that are not achievable using colour video alone. These effects critically rely on the geometric information, like a proposed video relighting technique which requires high-quality surface normals to produce plausible results. In addition, I demonstrate enhanced non-photorealistic rendering techniques and the ability to synthesise stereoscopic videos, which allows these effects to be applied stereoscopically. These stereoscopic renderings inspired me to study stereoscopic viewing discomfort. The result of this is a surprisingly simple computational model that predicts the visual comfort of stereoscopic images. I validated this model using a perceptual study, which showed that it correlates strongly with human comfort ratings. This makes it ideal for automatic comfort assessment, without the need for costly and lengthy perceptual studies

    Illustrative Informationsvisualisierung

    Get PDF
    Mit wachsender Größe von Daten wird es zunehmend schwerer, die in der Informationsvisualisierung erzeugte visuelle Repräsentation zu interpretieren und die relevanten Informationen adäquat darzustellen. Die Illustration beschäftigt sich seit längerem mit der Kommunikation wichtiger Bildinformationen. Das Ziel dieser Dissertation ist es deshalb, illustrative Verfahren in Techniken der Informationsvisualisierung zu integrieren - sowohl konzeptuell als auch in praktischer Anwendung. Als Ergebnis unterstützen die neu entwickelten Lösungsansätze die Kommunikation dargestellter Informationen
    corecore