214 research outputs found

    An object-based approach to plenoptic videos

    Get PDF
    This paper proposes an object-based approach to plenoptic videos, where the plenoptic video sequences are segmented into image-based rendering (IBR) objects each with its image sequence, depth map and other relevant information such as shape information. This allows desirable functionalities such as scalability of contents, error resilience, and interactivity with individual IBR objects to be supported. A portable capturing system consisting of two linear camera arrays, each hosting 6 JVC video cameras, was developed to verify the proposed approach. Rendering and compression results of real-world scenes demonstrate the usefulness and good quality of the proposed approach. © 2005 IEEE.published_or_final_versio

    Real-time refocusing using an FPGA-based standard plenoptic camera

    Get PDF
    Plenoptic cameras are receiving increased attention in scientific and commercial applications because they capture the entire structure of light in a scene, enabling optical transforms (such as focusing) to be applied computationally after the fact, rather than once and for all at the time a picture is taken. In many settings, real-time inter active performance is also desired, which in turn requires significant computational power due to the large amount of data required to represent a plenoptic image. Although GPUs have been shown to provide acceptable performance for real-time plenoptic rendering, their cost and power requirements make them prohibitive for embedded uses (such as in-camera). On the other hand, the computation to accomplish plenoptic rendering is well structured, suggesting the use of specialized hardware. Accordingly, this paper presents an array of switch-driven finite impulse response filters, implemented with FPGA to accomplish high-throughput spatial-domain rendering. The proposed architecture provides a power-efficient rendering hardware design suitable for full-video applications as required in broadcasting or cinematography. A benchmark assessment of the proposed hardware implementation shows that real-time performance can readily be achieved, with a one order of magnitude performance improvement over a GPU implementation and three orders ofmagnitude performance improvement over a general-purpose CPU implementation

    Coherent multi-dimensional segmentation of multiview images using a variational framework and applications to image based rendering

    No full text
    Image Based Rendering (IBR) and in particular light field rendering has attracted a lot of attention for interpolating new viewpoints from a set of multiview images. New images of a scene are interpolated directly from nearby available ones, thus enabling a photorealistic rendering. Sampling theory for light fields has shown that exact geometric information in the scene is often unnecessary for rendering new views. Indeed, the band of the function is approximately limited and new views can be rendered using classical interpolation methods. However, IBR using undersampled light fields suffers from aliasing effects and is difficult particularly when the scene has large depth variations and occlusions. In order to deal with these cases, we study two approaches: New sampling schemes have recently emerged that are able to perfectly reconstruct certain classes of parametric signals that are not bandlimited but characterized by a finite number of parameters. In this context, we derive novel sampling schemes for piecewise sinusoidal and polynomial signals. In particular, we show that a piecewise sinusoidal signal with arbitrarily high frequencies can be exactly recovered given certain conditions. These results are applied to parametric multiview data that are not bandlimited. We also focus on the problem of extracting regions (or layers) in multiview images that can be individually rendered free of aliasing. The problem is posed in a multidimensional variational framework using region competition. In extension to previous methods, layers are considered as multi-dimensional hypervolumes. Therefore the segmentation is done jointly over all the images and coherence is imposed throughout the data. However, instead of propagating active hypersurfaces, we derive a semi-parametric methodology that takes into account the constraints imposed by the camera setup and the occlusion ordering. The resulting framework is a global multi-dimensional region competition that is consistent in all the images and efficiently handles occlusions. We show the validity of the approach with captured light fields. Other special effects such as augmented reality and disocclusion of hidden objects are also demonstrated

    Steered mixture-of-experts for light field images and video : representation and coding

    Get PDF
    Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution

    Light Field compression and manipulation via residual convolutional neural network

    Get PDF
    Light field (LF) imaging has gained significant attention due to its recent success in microscopy, 3-dimensional (3D) displaying and rendering, augmented and virtual reality usage. Postprocessing of LF enables us to extract more information from a scene compared to traditional cameras. However, the use of LF is still a research novelty because of the current limitations in capturing high-resolution LF in all of its four dimensions. While researchers are actively improving methods of capturing high-resolution LF\u27s, using simulation, it is possible to explore a high-quality captured LF\u27s properties. The immediate concerns following the LF capture are its storage and processing time. A rich LF occupies a large chunk of memory ---order of multiple gigabytes per LF---. Also, most feature extraction techniques associated with LF postprocessing involve multi-dimensional integration that requires access to the whole LF and is usually time-consuming. Recent advancements in computer processing units made it possible to simulate realistic images using physical-based rendering software. In this work, at first, a transformation function is proposed for building a camera array (CA) to capture the same portion of LF from a scene that a standard plenoptic camera (SPC) can acquire. Using this transformation, LF simulation with similar properties as a plenoptic camera will become trivial in any rendering software. Artificial intelligence (AI) and machine learning (ML) algorithms ---when deployed on the new generation of GPUs--- are faster than ever. It is possible to generate and train large networks with millions of trainable parameters to learn very complex features. Here, residual convolutional neural network (RCNN) structures are employed to build complex networks for compression and feature extraction from an LF. By combining state-of-the-art image compression and RCNN, I have created a compression pipeline. The proposed pipeline\u27s bit per pixel (bpp) ratio is 0.0047 on average. I show that with a 1% compression time cost and 18x speedup for decompression, our methods reconstructed LFs have better structural similarity index metric (SSIM) and comparable peak signal-to-noise ratio (PSNR) compared to the state-of-the-art video compression techniques used to compress LFs. In the end, using RCNN, I created a network called RefNet, for extracting a group of 16 refocused images from a raw LF. The training parameters of the 16 LFs are set to (\alpha=0.125, 0.250, 0.375, ..., 2.0) for training. I show that RefNet is 134x faster than the state-of-the-art refocusing technique. The RefNet is also superior in color prediction compared to the state-of-the-art ---Fourier slice and shift-and-sum--- methods

    Constructing interactive multi-view videos based on image-based rendering

    Get PDF
    [[abstract]]In this paper, we use image-based rendering (IBR) to develop a scene rotation mechanism. We shot several images in the same scene and computed the angles between images. A video is then composed, allowing users to select viewing angles when the video is playing. We made three kinds of assumptions that may affect the resulting video, and proved our assumptions by a series of experiments. Finally, we use video of realistic scenario and produce interactive video by the proposed method. The contribution also includes techniques to compute geometric parameters of the scene from one or more images.[[notice]]補正完

    Image-Based Rendering Of Real Environments For Virtual Reality

    Get PDF

    Non-disruptive use of light fields in image and video processing

    Get PDF
    In the age of computational imaging, cameras capture not only an image but also data. This captured additional data can be best used for photo-realistic renderings facilitating numerous post-processing possibilities such as perspective shift, depth scaling, digital refocus, 3D reconstruction, and much more. In computational photography, the light field imaging technology captures the complete volumetric information of a scene. This technology has the highest potential to accelerate immersive experiences towards close-toreality. It has gained significance in both commercial and research domains. However, due to lack of coding and storage formats and also the incompatibility of the tools to process and enable the data, light fields are not exploited to its full potential. This dissertation approaches the integration of light field data to image and video processing. Towards this goal, the representation of light fields using advanced file formats designed for 2D image assemblies to facilitate asset re-usability and interoperability between applications and devices is addressed. The novel 5D light field acquisition and the on-going research on coding frameworks are presented. Multiple techniques for optimised sequencing of light field data are also proposed. As light fields contain complete 3D information of a scene, large amounts of data is captured and is highly redundant in nature. Hence, by pre-processing the data using the proposed approaches, excellent coding performance can be achieved.Im Zeitalter der computergestützten Bildgebung erfassen Kameras nicht mehr nur ein Bild, sondern vielmehr auch Daten. Diese erfassten Zusatzdaten lassen sich optimal für fotorealistische Renderings nutzen und erlauben zahlreiche Nachbearbeitungsmöglichkeiten, wie Perspektivwechsel, Tiefenskalierung, digitale Nachfokussierung, 3D-Rekonstruktion und vieles mehr. In der computergestützten Fotografie erfasst die Lichtfeld-Abbildungstechnologie die vollständige volumetrische Information einer Szene. Diese Technologie bietet dabei das größte Potenzial, immersive Erlebnisse zu mehr Realitätsnähe zu beschleunigen. Deshalb gewinnt sie sowohl im kommerziellen Sektor als auch im Forschungsbereich zunehmend an Bedeutung. Aufgrund fehlender Kompressions- und Speicherformate sowie der Inkompatibilität derWerkzeuge zur Verarbeitung und Freigabe der Daten, wird das Potenzial der Lichtfelder nicht voll ausgeschöpft. Diese Dissertation ermöglicht die Integration von Lichtfelddaten in die Bild- und Videoverarbeitung. Hierzu wird die Darstellung von Lichtfeldern mit Hilfe von fortschrittlichen für 2D-Bilder entwickelten Dateiformaten erarbeitet, um die Wiederverwendbarkeit von Assets- Dateien und die Kompatibilität zwischen Anwendungen und Geräten zu erleichtern. Die neuartige 5D-Lichtfeldaufnahme und die aktuelle Forschung an Kompressions-Rahmenbedingungen werden vorgestellt. Es werden zudem verschiedene Techniken für eine optimierte Sequenzierung von Lichtfelddaten vorgeschlagen. Da Lichtfelder die vollständige 3D-Information einer Szene beinhalten, wird eine große Menge an Daten, die in hohem Maße redundant sind, erfasst. Die hier vorgeschlagenen Ansätze zur Datenvorverarbeitung erreichen dabei eine ausgezeichnete Komprimierleistung
    • …
    corecore