22 research outputs found


    Get PDF
    The Computational Visual Media (CVM) conference series is intended to provide a major international forum for exchanging novel research ideas and significant computational methods that either underpin or apply visual media. The primary goal is to promote cross-disciplinary research to amalgamate aspects of computer graphics, computer vision, machine learning, image and video processing, visualization and geometric computing. The main topics of interest to CVM include classification, composition, retrieval, synthesis, cognition and understanding of visual media (e.g., images, videos, 3D geometry). The Computational Visual Media Conference 2020 (CVM 2020), the 8th international conference in the series, will be held during September 3–5, 2020, at Macau University of Science and Technology. Following the success of previous CVM conferences, CVM 2020 attracted broad attention from researchers worldwide. A total of 118 technical papers were submitted and reviewed by an international program committee comprising 86 selected experts. A total of 30 papers were accepted for oral presentation..

    DAugNet: Unsupervised, Multi-source, Multi-target, and Life-long Domain Adaptation for Semantic Segmentation of Satellite Images

    Full text link
    The domain adaptation of satellite images has recently gained an increasing attention to overcome the limited generalization abilities of machine learning models when segmenting large-scale satellite images. Most of the existing approaches seek for adapting the model from one domain to another. However, such single-source and single-target setting prevents the methods from being scalable solutions, since nowadays multiple source and target domains having different data distributions are usually available. Besides, the continuous proliferation of satellite images necessitates the classifiers to adapt to continuously increasing data. We propose a novel approach, coined DAugNet, for unsupervised, multi-source, multi-target, and life-long domain adaptation of satellite images. It consists of a classifier and a data augmentor. The data augmentor, which is a shallow network, is able to perform style transfer between multiple satellite images in an unsupervised manner, even when new data are added over the time. In each training iteration, it provides the classifier with diversified data, which makes the classifier robust to large data distribution difference between the domains. Our extensive experiments prove that DAugNet significantly better generalizes to new geographic locations than the existing approaches

    Thirteenth Biennial Status Report: April 2015 - February 2017

    No full text

    Influence of Directional Sound Cues on Users'' Exploration across 360° Movie Cuts

    Get PDF
    Virtual reality (VR) is a powerful medium for 360° 360 storytelling, yet content creators are still in the process of developing cinematographic rules for effectively communicating stories in VR. Traditional cinematography has relied for over a century on well-established techniques for editing, and one of the most recurrent resources for this are cinematic cuts that allow content creators to seamlessly transition between scenes. One fundamental assumption of these techniques is that the content creator can control the camera; however, this assumption breaks in VR: Users are free to explore 360° 360 around them. Recent works have studied the effectiveness of different cuts in 360° 360 content, but the effect of directional sound cues while experiencing these cuts has been less explored. In this work, we provide the first systematic analysis of the influence of directional sound cues in users'' behavior across 360° 360 movie cuts, providing insights that can have an impact on deriving conventions for VR storytelling. © 1981-2012 IEEE

    Neural Novel Actor: Learning a Generalized Animatable Neural Representation for Human Actors

    Full text link
    We propose a new method for learning a generalized animatable neural human representation from a sparse set of multi-view imagery of multiple persons. The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control. While existing methods can either generalize to new persons or synthesize animations with user control, none of them can achieve both at the same time. We attribute this accomplishment to the employment of a 3D proxy for a shared multi-person human model, and further the warping of the spaces of different poses to a shared canonical pose space, in which we learn a neural field and predict the person- and pose-dependent deformations, as well as appearance with the features extracted from input images. To cope with the complexity of the large variations in body shapes, poses, and clothing deformations, we design our neural human model with disentangled geometry and appearance. Furthermore, we utilize the image features both at the spatial point and on the surface points of the 3D proxy for predicting person- and pose-dependent properties. Experiments show that our method significantly outperforms the state-of-the-arts on both tasks. The video and code are available at https://talegqz.github.io/neural_novel_actor

    Transformation-aware Perceptual Image Metric

    Get PDF
    Predicting human visual perception has several applications such as compression, rendering, editing, and retargeting. Current approaches, however, ignore the fact that the human visual system compensates for geometric transformations, e.g., we see that an image and a rotated copy are identical. Instead, they will report a large, false-positive difference. At the same time, if the transformations become too strong or too spatially incoherent, comparing two images gets increasingly difficult. Between these two extrema, we propose a system to quantify the effect of transformations, not only on the perception of image differences but also on saliency and motion parallax. To this end, we first fit local homographies to a given optical flow field, and then convert this field into a field of elementary transformations, such as translation, rotation, scaling, and perspective. We conduct a perceptual experiment quantifying the increase of difficulty when compensating for elementary transformations. Transformation entropy is proposed as a measure of complexity in a flow field. This representation is then used for applications, such as comparison of nonaligned images, where transformations cause threshold elevation, detection of salient transformations, and a model of perceived motion parallax. Applications of our approach are a perceptual level-of-detail for real-time rendering and viewpoint selection based on perceived motion parallax

    A Deeper Look into DeepCap

    Get PDF
    Human performance capture is a highly important computer vision problem with many applications in movie production and virtual/augmented reality. Many previous performance capture approaches either required expensive multi-view setups or did not recover dense space-time coherent geometry with frame-to-frame correspondences. We propose a novel deep learning approach for monocular dense human performance capture. Our method is trained in a weakly supervised manner based on multi-view supervision completely removing the need for training data with 3D ground truth annotations. The network architecture is based on two separate networks that disentangle the task into a pose estimation and a non-rigid surface deformation step. Extensive qualitative and quantitative evaluations show that our approach outperforms the state of the art in terms of quality and robustness. This work is an extended version of DeepCap where we provide more detailed explanations, comparisons and results as well as applications

    Simulating liquids on dynamically warping grids

    Get PDF
    We introduce dynamically warping grids for adaptive liquid simulation. Our primary contributions are a strategy for dynamically deforming regular grids over the course of a simulation and a method for efficiently utilizing these deforming grids for liquid simulation. Prior work has shown that unstructured grids are very effective for adaptive fluid simulations. However, unstructured grids often lead to complicated implementations and a poor cache hit rate due to inconsistent memory access. Regular grids, on the other hand, provide a fast, fixed memory access pattern and straightforward implementation. Our method combines the advantages of both: we leverage the simplicity of regular grids while still achieving practical and controllable spatial adaptivity. We demonstrate that our method enables adaptive simulations that are fast, flexible, and robust to null-space issues. At the same time, our method is simple to implement and takes advantage of existing highly-tuned algorithms