9,150 research outputs found

    Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing

    Full text link
    Free-viewpoint video conferencing allows a participant to observe the remote 3D scene from any freely chosen viewpoint. An intermediate virtual viewpoint image is commonly synthesized using two pairs of transmitted texture and depth maps from two neighboring captured viewpoints via depth-image-based rendering (DIBR). To maintain high quality of synthesized images, it is imperative to contain the adverse effects of network packet losses that may arise during texture and depth video transmission. Towards this end, we develop an integrated approach that exploits the representation redundancy inherent in the multiple streamed videos a voxel in the 3D scene visible to two captured views is sampled and coded twice in the two views. In particular, at the receiver we first develop an error concealment strategy that adaptively blends corresponding pixels in the two captured views during DIBR, so that pixels from the more reliable transmitted view are weighted more heavily. We then couple it with a sender-side optimization of reference picture selection (RPS) during real-time video coding, so that blocks containing samples of voxels that are visible in both views are more error-resiliently coded in one view only, given adaptive blending will erase errors in the other view. Further, synthesized view distortion sensitivities to texture versus depth errors are analyzed, so that relative importance of texture and depth code blocks can be computed for system-wide RPS optimization. Experimental results show that the proposed scheme can outperform the use of a traditional feedback channel by up to 0.82 dB on average at 8% packet loss rate, and by as much as 3 dB for particular frames

    Interactive Vegetation Rendering with Slicing and Blending

    Get PDF
    Detailed and interactive 3D rendering of vegetation is one of the challenges of traditional polygon-oriented computer graphics, due to large geometric complexity even of simple plants. In this paper we introduce a simplified image-based rendering approach based solely on alpha-blended textured polygons. The simplification is based on the limitations of human perception of complex geometry. Our approach renders dozens of detailed trees in real-time with off-the-shelf hardware, while providing significantly improved image quality over existing real-time techniques. The method is based on using ordinary mesh-based rendering for the solid parts of a tree, its trunk and limbs. The sparse parts of a tree, its twigs and leaves, are instead represented with a set of slices, an image-based representation. A slice is a planar layer, represented with an ordinary alpha or color-keyed texture; a set of parallel slices is a slicing. Rendering from an arbitrary viewpoint in a 360 degree circle around the center of a tree is achieved by blending between the nearest two slicings. In our implementation, only 6 slicings with 5 slices each are sufficient to visualize a tree for a moving or stationary observer with the perceptually similar quality as the original model

    BLADE: Filter Learning for General Purpose Computational Photography

    Full text link
    The Rapid and Accurate Image Super Resolution (RAISR) method of Romano, Isidoro, and Milanfar is a computationally efficient image upscaling method using a trained set of filters. We describe a generalization of RAISR, which we name Best Linear Adaptive Enhancement (BLADE). This approach is a trainable edge-adaptive filtering framework that is general, simple, computationally efficient, and useful for a wide range of problems in computational photography. We show applications to operations which may appear in a camera pipeline including denoising, demosaicing, and stylization

    Human Motion Capture Data Tailored Transform Coding

    Full text link
    Human motion capture (mocap) is a widely used technique for digitalizing human movements. With growing usage, compressing mocap data has received increasing attention, since compact data size enables efficient storage and transmission. Our analysis shows that mocap data have some unique characteristics that distinguish themselves from images and videos. Therefore, directly borrowing image or video compression techniques, such as discrete cosine transform, does not work well. In this paper, we propose a novel mocap-tailored transform coding algorithm that takes advantage of these features. Our algorithm segments the input mocap sequences into clips, which are represented in 2D matrices. Then it computes a set of data-dependent orthogonal bases to transform the matrices to frequency domain, in which the transform coefficients have significantly less dependency. Finally, the compression is obtained by entropy coding of the quantized coefficients and the bases. Our method has low computational cost and can be easily extended to compress mocap databases. It also requires neither training nor complicated parameter setting. Experimental results demonstrate that the proposed scheme significantly outperforms state-of-the-art algorithms in terms of compression performance and speed

    Removal Of Blocking Artifacts From JPEG-Compressed Images Using An Adaptive Filtering Algorithm

    Get PDF
    The aim of this research was to develop an algorithm that will produce a considerable improvement in the quality of JPEG images, by removing blocking and ringing artifacts, irrespective of the level of compression present in the image. We review multiple published related works, and finally present a computationally efficient algorithm for reducing the blocky and Gibbs oscillation artifacts commonly present in JPEG compressed images. The algorithm alpha-blends a smoothed version of the image with the original image; however, the blending is controlled by a limit factor that considers the amount of compression present and any local edge information derived from the application of a Prewitt filter. In addition, the actual value of the blending coefficient (α) is derived from the local Mean Structural Similarity Index Measure (MSSIM) which is also adjusted by a factor that also considers the amount of compression present. We also present our results as well as the results for a variety of other papers whose authors used other post compression filtering methods

    Neural View-Interpolation for Sparse Light Field Video

    No full text
    We suggest representing light field (LF) videos as "one-off" neural networks (NN), i.e., a learned mapping from view-plus-time coordinates to high-resolution color values, trained on sparse views. Initially, this sounds like a bad idea for three main reasons: First, a NN LF will likely have less quality than a same-sized pixel basis representation. Second, only few training data, e.g., 9 exemplars per frame are available for sparse LF videos. Third, there is no generalization across LFs, but across view and time instead. Consequently, a network needs to be trained for each LF video. Surprisingly, these problems can turn into substantial advantages: Other than the linear pixel basis, a NN has to come up with a compact, non-linear i.e., more intelligent, explanation of color, conditioned on the sparse view and time coordinates. As observed for many NN however, this representation now is interpolatable: if the image output for sparse view coordinates is plausible, it is for all intermediate, continuous coordinates as well. Our specific network architecture involves a differentiable occlusion-aware warping step, which leads to a compact set of trainable parameters and consequently fast learning and fast execution
    corecore