30 research outputs found

    Baseline and triangulation geometry in a standard plenoptic camera

    Get PDF
    In this paper, we demonstrate light field triangulation to determine depth distances and baselines in a plenoptic camera. The advancement of micro lenses and image sensors enabled plenoptic cameras to capture a scene from different viewpoints with sufficient spatial resolution. While object distances can be inferred from disparities in a stereo viewpoint pair using triangulation, this concept remains ambiguous when applied in case of plenoptic cameras. We present a geometrical light field model allowing the triangulation to be applied to a plenoptic camera in order to predict object distances or to specify baselines as desired. It is shown that distance estimates from our novel method match those of real objects placed in front of the camera. Additional benchmark tests with an optical design software further validate the model’s accuracy with deviations of less than 0:33 % for several main lens types and focus settings. A variety of applications in the automotive and robotics field can benefit from this estimation model

    Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing

    Full text link
    Free-viewpoint video conferencing allows a participant to observe the remote 3D scene from any freely chosen viewpoint. An intermediate virtual viewpoint image is commonly synthesized using two pairs of transmitted texture and depth maps from two neighboring captured viewpoints via depth-image-based rendering (DIBR). To maintain high quality of synthesized images, it is imperative to contain the adverse effects of network packet losses that may arise during texture and depth video transmission. Towards this end, we develop an integrated approach that exploits the representation redundancy inherent in the multiple streamed videos a voxel in the 3D scene visible to two captured views is sampled and coded twice in the two views. In particular, at the receiver we first develop an error concealment strategy that adaptively blends corresponding pixels in the two captured views during DIBR, so that pixels from the more reliable transmitted view are weighted more heavily. We then couple it with a sender-side optimization of reference picture selection (RPS) during real-time video coding, so that blocks containing samples of voxels that are visible in both views are more error-resiliently coded in one view only, given adaptive blending will erase errors in the other view. Further, synthesized view distortion sensitivities to texture versus depth errors are analyzed, so that relative importance of texture and depth code blocks can be computed for system-wide RPS optimization. Experimental results show that the proposed scheme can outperform the use of a traditional feedback channel by up to 0.82 dB on average at 8% packet loss rate, and by as much as 3 dB for particular frames

    Quality Assessment of a Stereo Pair Formed From Two Synthesized Views Using Objective Metrics

    Get PDF
    When a stereo pair is formed from two synthesized views, it is unclear whether objective 2D quality metrics can provide a good estimation of the perceived quality. In this paper, this problem is addressed considering a 3D video represented in multiview video plus depth format. The performance of different state-of-the-art 2D quality metrics is analyzed in terms of correlation with subjective perception of video quality. A set of subjective data collected through formal subjective evaluation tests is used as benchmark. Results show that some objective metrics, including PSNR, do not predict well perceived quality of synthesized views. On the other hand, metrics such as VIF, VQM, MS-SSIM, or SSIM have a high correlation with perceived quality

    Application of augmented reality and robotic technology in broadcasting: A survey

    Get PDF
    As an innovation technique, Augmented Reality (AR) has been gradually deployed in the broadcast, videography and cinematography industries. Virtual graphics generated by AR are dynamic and overlap on the surface of the environment so that the original appearance can be greatly enhanced in comparison with traditional broadcasting. In addition, AR enables broadcasters to interact with augmented virtual 3D models on a broadcasting scene in order to enhance the performance of broadcasting. Recently, advanced robotic technologies have been deployed in a camera shooting system to create a robotic cameraman so that the performance of AR broadcasting could be further improved, which is highlighted in the paper

    Implementation of a Depth from Light Field Algorithm on FPGA

    Get PDF
    A light field is a four-dimensional function that grabs the intensity of light rays traversing an empty space at each point. The light field can be captured using devices designed specifically for this purpose and it allows one to extract depth information about the scene. Most light-field algorithms require a huge amount of processing power. Fortunately, in recent years, parallel hardware has evolved and enables such volumes of data to be processed. Field programmable gate arrays are one such option. In this paper, we propose two hardware designs that share a common construction block to compute a disparity map from light-field data. The first design employs serial data input into the hardware, while the second employs view parallel input. These designs focus on performing calculations during data read-in and producing results only a few clock cycles after read-in. Several experiments were conducted. First, the influence of using fixed-point arithmetic on accuracy was tested using synthetic light-field data. Also tests on actual light field data were performed. The performance was compared to that of a CPU, as well as an embedded processor. Our designs showed similar performance to the former and outperformed the latter. For further comparison, we also discuss the performance difference between our designs and other designs described in the literatur

    Compression and Subjective Quality Assessment of 3D Video

    Get PDF
    In recent years, three-dimensional television (3D TV) has been broadly considered as the successor to the existing traditional two-dimensional television (2D TV) sets. With its capability of offering a dynamic and immersive experience, 3D video (3DV) is expected to expand conventional video in several applications in the near future. However, 3D content requires more than a single view to deliver the depth sensation to the viewers and this, inevitably, increases the bitrate compared to the corresponding 2D content. This need drives the research trend in video compression field towards more advanced and more efficient algorithms. Currently, the Advanced Video Coding (H.264/AVC) is the state-of-the-art video coding standard which has been developed by the Joint Video Team of ISO/IEC MPEG and ITU-T VCEG. This codec has been widely adopted in various applications and products such as TV broadcasting, video conferencing, mobile TV, and blue-ray disc. One important extension of H.264/AVC, namely Multiview Video Coding (MVC) was an attempt to multiple view compression by taking into consideration the inter-view dependency between different views of the same scene. This codec H.264/AVC with its MVC extension (H.264/MVC) can be used for encoding either conventional stereoscopic video, including only two views, or multiview video, including more than two views. In spite of the high performance of H.264/MVC, a typical multiview video sequence requires a huge amount of storage space, which is proportional to the number of offered views. The available views are still limited and the research has been devoted to synthesizing an arbitrary number of views using the multiview video and depth map (MVD). This process is mandatory for auto-stereoscopic displays (ASDs) where many views are required at the viewer side and there is no way to transmit such a relatively huge number of views with currently available broadcasting technology. Therefore, to satisfy the growing hunger for 3D related applications, it is mandatory to further decrease the bitstream by introducing new and more efficient algorithms for compressing multiview video and depth maps. This thesis tackles the 3D content compression targeting different formats i.e. stereoscopic video and depth-enhanced multiview video. Stereoscopic video compression algorithms introduced in this thesis mostly focus on proposing different types of asymmetry between the left and right views. This means reducing the quality of one view compared to the other view aiming to achieve a better subjective quality against the symmetric case (the reference) and under the same bitrate constraint. The proposed algorithms to optimize depth-enhanced multiview video compression include both texture compression schemes as well as depth map coding tools. Some of the introduced coding schemes proposed for this format include asymmetric quality between the views. Knowing that objective metrics are not able to accurately estimate the subjective quality of stereoscopic content, it is suggested to perform subjective quality assessment to evaluate different codecs. Moreover, when the concept of asymmetry is introduced, the Human Visual System (HVS) performs a fusion process which is not completely understood. Therefore, another important aspect of this thesis is conducting several subjective tests and reporting the subjective ratings to evaluate the perceived quality of the proposed coded content against the references. Statistical analysis is carried out in the thesis to assess the validity of the subjective ratings and determine the best performing test cases

    Photogrammetry as a New Scientific Tool in Archaeology: Worldwide Research Trends

    Get PDF
    Archaeology has made significant advances in the last 20 years. This can be seen by the remarkable increase in specialised literature on all archaeology-related disciplines. These advances have made it a science with links to many other sciences, both in the field of experimental sciences and in the use of techniques from other disciplines such as engineering. Within this last issue it is important to highlight the great advance that the use of photogrammetry has brought for archaeology. In this research, through a systematic study with bibliometric techniques, the main institutions and countries that are carrying them out and the main interests of the scientific community in archaeology related to photogrammetry have been identified. The main increase in this field has been observed since 2010, especially the contribution of UAVs that have reduced the cost of photogrammetric flights for reduced areas. The main lines of research in photogrammetry applied to archaeology are close-range photogrammetry, aerial photogrammetry (UAV), cultural heritage, excavation, cameras, GPS, laser scan, and virtual reconstruction including 3D printing

    Quality of Experience in Immersive Video Technologies

    Get PDF
    Over the last decades, several technological revolutions have impacted the television industry, such as the shifts from black & white to color and from standard to high-definition. Nevertheless, further considerable improvements can still be achieved to provide a better multimedia experience, for example with ultra-high-definition, high dynamic range & wide color gamut, or 3D. These so-called immersive technologies aim at providing better, more realistic, and emotionally stronger experiences. To measure quality of experience (QoE), subjective evaluation is the ultimate means since it relies on a pool of human subjects. However, reliable and meaningful results can only be obtained if experiments are properly designed and conducted following a strict methodology. In this thesis, we build a rigorous framework for subjective evaluation of new types of image and video content. We propose different procedures and analysis tools for measuring QoE in immersive technologies. As immersive technologies capture more information than conventional technologies, they have the ability to provide more details, enhanced depth perception, as well as better color, contrast, and brightness. To measure the impact of immersive technologies on the viewersâ QoE, we apply the proposed framework for designing experiments and analyzing collected subjectsâ ratings. We also analyze eye movements to study human visual attention during immersive content playback. Since immersive content carries more information than conventional content, efficient compression algorithms are needed for storage and transmission using existing infrastructures. To determine the required bandwidth for high-quality transmission of immersive content, we use the proposed framework to conduct meticulous evaluations of recent image and video codecs in the context of immersive technologies. Subjective evaluation is time consuming, expensive, and is not always feasible. Consequently, researchers have developed objective metrics to automatically predict quality. To measure the performance of objective metrics in assessing immersive content quality, we perform several in-depth benchmarks of state-of-the-art and commonly used objective metrics. For this aim, we use ground truth quality scores, which are collected under our subjective evaluation framework. To improve QoE, we propose different systems for stereoscopic and autostereoscopic 3D displays in particular. The proposed systems can help reducing the artifacts generated at the visualization stage, which impact picture quality, depth quality, and visual comfort. To demonstrate the effectiveness of these systems, we use the proposed framework to measure viewersâ preference between these systems and standard 2D & 3D modes. In summary, this thesis tackles the problems of measuring, predicting, and improving QoE in immersive technologies. To address these problems, we build a rigorous framework and we apply it through several in-depth investigations. We put essential concepts of multimedia QoE under this framework. These concepts not only are of fundamental nature, but also have shown their impact in very practical applications. In particular, the JPEG, MPEG, and VCEG standardization bodies have adopted these concepts to select technologies that were proposed for standardization and to validate the resulting standards in terms of compression efficiency

    Fusing spatial and temporal components for real-time depth data enhancement of dynamic scenes

    Get PDF
    The depth images from consumer depth cameras (e.g., structured-light/ToF devices) exhibit a substantial amount of artifacts (e.g., holes, flickering, ghosting) that needs to be removed for real-world applications. Existing methods cannot entirely remove them and perform slow. This thesis proposes a new real-time spatio-temporal depth image enhancement filter that completely removes flickering and ghosting, and significantly reduces holes. This thesis also presents a novel depth-data capture setup and two data reduction methods to optimize the performance of the proposed enhancement method
    corecore