55,849 research outputs found

    HEVC based Stereo Video codec

    Get PDF
    Development of stereo video codecs in latest multi-view extension of HEVC (MV-HEVC) with higher compression efficiency has been an active area of research. In this paper, a frame interleaved stereo video coding scheme based on MVHEVC standard codec is proposed. The proposed codec applies a reduced layer approach to encode the frame interleaved stereo sequences. A frame interleaving algorithm is developed to reorder the stereo video frames into a monocular video, such that the proposed codec can gain advantage from inter-views and temporal correlations to improve its coding performance. To evaluate the performance of the proposed codec; three standard multi-view test video sequences, named “Poznan_Street”, “Kendo” and “Newspaper1”, were selected and coded using the proposed codec and the standard MV-HEVC codec at different QPs and bitrates. Experimental results show that the proposed codec gives a significantly higher coding performance to that of the standard MV-HEVC codec at all bitrates

    3D stereo reconstruction: High resolution satellite video

    Get PDF
    Precise high-resolution Digital Elevation Models (DEMs) are essential for creation of terrain relief and associated terrain hazard area maps, urban land development, smart cities and in other applications. The 3D modelling system entitled the UCL Co-registration Ames Stereo Pipeline (ASP) Gotcha Optimised (CASP-GO) was demonstrated on stereo data of Mars to generate 3D models for around 20% of Martian surface using cloud computers which was reported in 2018. CASP-GO is an automated DEM/DTM processing chain for NASA Mars, lunar and Earth Observation data including Mars 6m Context Camera (CTX) and High Resolution Imaging Science Experiment (HiRISE) 25cm stereo-data as well as ASTER 18m stereo data acquired on the NASA EOS Terra platform. CASP-GO uses tie-point based multi- resolution image co-registration, combined with sub-pixel refinement and densification. It is based on a combination of the NASA ASP and an adaptive least squares cor- relation and region growing matcher called Gotcha (Gruen-Otto-Chau). CASP-GO was successfully applied to produce more than 5300 DTMs of Mars (http://www.i- Mars.eu/web-GIS). This work employs CASP-GO to obtain DEMs from high resolution Earth Observation (EO) satellite video system SSTL Carbonite-2. CASP- GO was modified to work with multi-view point-and-stare video data including subpixel fusion of point clouds. Multi-view stereo video data are distinguished from still image data by a richer amount of information and noisier water areas

    HEVC Based Frame Interleaved Coding Technique for Stereo and Multi-View Videos

    Get PDF
    The standard HEVC codec and its extension for coding multiview videos, known as MV-HEVC, have proven to deliver improved visual quality compared to its predecessor, H.264/MPEG-4 AVC’s multiview extension, H.264-MVC, for the same frame resolution with up to 50% bitrate savings. MV-HEVC’s framework is similar to that of H.264-MVC, which uses a multi-layer coding approach. Hence, MV-HEVC would require all frames from other reference layers decoded prior to decoding a new layer. Thus, the multi-layer coding architecture would be a bottleneck when it comes to quicker frame streaming across different views. In this paper, an HEVC-based Frame Interleaved Stereo/Multiview Video Codec (HEVC-FISMVC) that uses a single layer encoding approach to encode stereo and multiview video sequences is presented. The frames of stereo or multiview video sequences are interleaved in such a way that encoding the resulting monoscopic video stream would maximize the exploitation of temporal, inter-view, and cross-view correlations and thus improving the overall coding efficiency. The coding performance of the proposed HEVC-FISMVC codec is assessed and compared with that of the standard MV-HEVC’s performance for three standard multi-view video sequences, namely: “Poznan_Street”, “Kendo” and “Newspaper1”. Experimental results show that the proposed codec provides more substantial coding gains than the anchor MV-HEVC for coding both stereo and multi-view video sequences

    H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System

    Full text link
    High-speed, high-resolution stereoscopic (H2-Stereo) video allows us to perceive dynamic 3D content at fine granularity. The acquisition of H2-Stereo video, however, remains challenging with commodity cameras. Existing spatial super-resolution or temporal frame interpolation methods provide compromised solutions that lack temporal or spatial details, respectively. To alleviate this problem, we propose a dual camera system, in which one camera captures high-spatial-resolution low-frame-rate (HSR-LFR) videos with rich spatial details, and the other captures low-spatial-resolution high-frame-rate (LSR-HFR) videos with smooth temporal details. We then devise a Learned Information Fusion network (LIFnet) that exploits the cross-camera redundancies to enhance both camera views to high spatiotemporal resolution (HSTR) for reconstructing the H2-Stereo video effectively. We utilize a disparity network to transfer spatiotemporal information across views even in large disparity scenes, based on which, we propose disparity-guided flow-based warping for LSR-HFR view and complementary warping for HSR-LFR view. A multi-scale fusion method in feature domain is proposed to minimize occlusion-induced warping ghosts and holes in HSR-LFR view. The LIFnet is trained in an end-to-end manner using our collected high-quality Stereo Video dataset from YouTube. Extensive experiments demonstrate that our model outperforms existing state-of-the-art methods for both views on synthetic data and camera-captured real data with large disparity. Ablation studies explore various aspects, including spatiotemporal resolution, camera baseline, camera desynchronization, long/short exposures and applications, of our system to fully understand its capability for potential applications

    Static scene illumination estimation from video with applications

    Get PDF
    We present a system that automatically recovers scene geometry and illumination from a video, providing a basis for various applications. Previous image based illumination estimation methods require either user interaction or external information in the form of a database. We adopt structure-from-motion and multi-view stereo for initial scene reconstruction, and then estimate an environment map represented by spherical harmonics (as these perform better than other bases). We also demonstrate several video editing applications that exploit the recovered geometry and illumination, including object insertion (e.g., for augmented reality), shadow detection, and video relighting

    User Directed Multi-view-stereo

    Full text link
    Abstract. Depth reconstruction from video footage and image collec-tions is a fundamental part of many modelling and image-based render-ing applications. However real-world scenes often contain limited texture information, repeated elements and other ambiguities which remain chal-lenging for fully automatic algorithms. This paper presents a technique that combines intuitive user constraints with dense multi-view stereo reconstruction. By providing annotations in the form of simple paint strokes, a user can guide a multi-view stereo algorithm and avoid com-mon failure cases. We show how smoothness, discontinuity and depth ordering constraints can be incorporated directly into a variational opti-mization framework for multi-view stereo. Our method avoids the need for heuristic approaches that edit a depth-map in a sequential process, and avoids requiring the user to accurately segment object boundaries or to directly model geometry. We show how with a small amount of intuitive input, a user may create improved depth maps in challenging cases for multi-view-stereo.

    FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF

    Full text link
    The success of the GAN-NeRF structure has enabled face editing on NeRF to maintain 3D view consistency. However, achieving simultaneously multi-view consistency and temporal coherence while editing video sequences remains a formidable challenge. This paper proposes a novel face video editing architecture built upon the dynamic face GAN-NeRF structure, which effectively utilizes video sequences to restore the latent code and 3D face geometry. By editing the latent code, multi-view consistent editing on the face can be ensured, as validated by multiview stereo reconstruction on the resulting edited images in our dynamic NeRF. As the estimation of face geometries occurs on a frame-by-frame basis, this may introduce a jittering issue. We propose a stabilizer that maintains temporal coherence by preserving smooth changes of face expressions in consecutive frames. Quantitative and qualitative analyses reveal that our method, as the pioneering 4D face video editor, achieves state-of-the-art performance in comparison to existing 2D or 3D-based approaches independently addressing identity and motion. Codes will be released.Comment: Our code will be available at: https://github.com/ZHANG1023/FED-NeR

    Multi-Scale 3D Scene Flow from Binocular Stereo Sequences

    Full text link
    Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene flow estimation that provides reliable results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than previous methods allow. To handle the aperture problems inherent in the estimation of optical flow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.National Science Foundation (CNS-0202067, IIS-0208876); Office of Naval Research (N00014-03-1-0108

    PEER-TO-PEER 3D/MULTI-VIEW VIDEO STREAMING

    Get PDF
    Abstract The recent advances in stereoscopic video capture, compression and display have made 3D video a visually appealing and costly affordable technology. More sophisticated multi-view videos have also been demonstrated. Yet their remarkably increased data volume poses greater challenges to the conventional client/server systems. The stringent synchronization demands from different views further complicate the system design. In this thesis, we present an initial attempt toward efficient streaming of 3D videos over peer-to-peer networks. We show that the inherent multi-stream nature of 3D video makes playback synchronization more difficult. We address this by a 2-stream buffer, together with a novel segment scheduling. We further extend our system to support multi-view video with view diversity and dynamics. We have evaluated our system under different end-system and network configurations with typical stereo video streams. The simulation results demonstrate the superiority of our system in terms of scalability, streaming quality and dealing with view dynamics
    corecore