545 research outputs found

    Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing

    Full text link
    Free-viewpoint video conferencing allows a participant to observe the remote 3D scene from any freely chosen viewpoint. An intermediate virtual viewpoint image is commonly synthesized using two pairs of transmitted texture and depth maps from two neighboring captured viewpoints via depth-image-based rendering (DIBR). To maintain high quality of synthesized images, it is imperative to contain the adverse effects of network packet losses that may arise during texture and depth video transmission. Towards this end, we develop an integrated approach that exploits the representation redundancy inherent in the multiple streamed videos a voxel in the 3D scene visible to two captured views is sampled and coded twice in the two views. In particular, at the receiver we first develop an error concealment strategy that adaptively blends corresponding pixels in the two captured views during DIBR, so that pixels from the more reliable transmitted view are weighted more heavily. We then couple it with a sender-side optimization of reference picture selection (RPS) during real-time video coding, so that blocks containing samples of voxels that are visible in both views are more error-resiliently coded in one view only, given adaptive blending will erase errors in the other view. Further, synthesized view distortion sensitivities to texture versus depth errors are analyzed, so that relative importance of texture and depth code blocks can be computed for system-wide RPS optimization. Experimental results show that the proposed scheme can outperform the use of a traditional feedback channel by up to 0.82 dB on average at 8% packet loss rate, and by as much as 3 dB for particular frames

    Robust and efficient video/image transmission

    Get PDF
    The Internet has become a primary medium for information transmission. The unreliability of channel conditions, limited channel bandwidth and explosive growth of information transmission requests, however, hinder its further development. Hence, research on robust and efficient delivery of video/image content is demanding nowadays. Three aspects of this task, error burst correction, efficient rate allocation and random error protection are investigated in this dissertation. A novel technique, called successive packing, is proposed for combating multi-dimensional (M-D) bursts of errors. A new concept of basis interleaving array is introduced. By combining different basis arrays, effective M-D interleaving can be realized. It has been shown that this algorithm can be implemented only once and yet optimal for a set of error bursts having different sizes for a given two-dimensional (2-D) array. To adapt to variable channel conditions, a novel rate allocation technique is proposed for FineGranular Scalability (FGS) coded video, in which real data based rate-distortion modeling is developed, constant quality constraint is adopted and sliding window approach is proposed to adapt to the variable channel conditions. By using the proposed technique, constant quality is realized among frames by solving a set of linear functions. Thus, significant computational simplification is achieved compared with the state-of-the-art techniques. The reduction of the overall distortion is obtained at the same time. To combat the random error during the transmission, an unequal error protection (UEP) method and a robust error-concealment strategy are proposed for scalable coded video bitstreams

    Efficient Video Transport over Lossy Networks

    Full text link
    Nowadays, packet video is an important application of the Internet. Unfortunately the capacity of the Internet is still very heterogeneous because it connects high bandwidth ATM networks as well as low bandwidth ISDN dial in lines. The MPEG-2 and MPEG-4 video compression standards provide efficient video encoding for high and low bandwidth media streams. In particular they include two paradigms which make those standards suitable for the transmission of video via heterogeneous networks. Both support layered video streams and MPEG-4 additionally allows the independent coding of video objects. In this paper we discuss those two paradigms, give an overview of the MPEG video compression standards and describe transport protocols for Real Time Media transport over lossy networks. Furthermore, we propose a real-time segmentation approach for extracting video objects in teleteaching scenarios

    An efficient error resilience scheme based on wyner-ziv coding for region-of-Interest protection of wavelet based video transmission

    Get PDF
    In this paper, we propose a bandwidth efficient error resilience scheme for wavelet based video transmission over wireless channel by introducing an additional Wyner-Ziv (WZ) stream to protect region of interest (ROI) in a frame. In the proposed architecture, the main video stream is compressed by a generic wavelet domain coding structure and passed through the error prone channel without any protection. Meanwhile, the predefined ROI area related wavelet coefficients obtained after an integer wavelet transform will be specially protected by WZ codec in an additional channel during transmission. At the decoder side, the error-prone ROI related wavelet coefficients will be used as side information to help decoding the WZ stream. Different size of WZ bit streams can be applied in order to meet different bandwidth condition and different requirement of end users. The simulation results clearly revealed that the proposed scheme has distinct advantages in saving bandwidth comparing with fully applied FEC algorithm to whole video stream and in the meantime offer the robust transmission over error prone channel for certain video applications

    Image analysis using visual saliency with applications in hazmat sign detection and recognition

    Get PDF
    Visual saliency is the perceptual process that makes attractive objects stand out from their surroundings in the low-level human visual system. Visual saliency has been modeled as a preprocessing step of the human visual system for selecting the important visual information from a scene. We investigate bottom-up visual saliency using spectral analysis approaches. We present separate and composite model families that generalize existing frequency domain visual saliency models. We propose several frequency domain visual saliency models to generate saliency maps using new spectrum processing methods and an entropy-based saliency map selection approach. A group of saliency map candidates are then obtained by inverse transform. A final saliency map is selected among the candidates by minimizing the entropy of the saliency map candidates. The proposed models based on the separate and composite model families are also extended to various color spaces. We develop an evaluation tool for benchmarking visual saliency models. Experimental results show that the proposed models are more accurate and efficient than most state-of-the-art visual saliency models in predicting eye fixation.^ We use the above visual saliency models to detect the location of hazardous material (hazmat) signs in complex scenes. We develop a hazmat sign location detection and content recognition system using visual saliency. Saliency maps are employed to extract salient regions that are likely to contain hazmat sign candidates and then use a Fourier descriptor based contour matching method to locate the border of hazmat signs in these regions. This visual saliency based approach is able to increase the accuracy of sign location detection, reduce the number of false positive objects, and speed up the overall image analysis process. We also propose a color recognition method to interpret the color inside the detected hazmat sign. Experimental results show that our proposed hazmat sign location detection method is capable of detecting and recognizing projective distorted, blurred, and shaded hazmat signs at various distances.^ In other work we investigate error concealment for scalable video coding (SVC). When video compressed with SVC is transmitted over loss-prone networks, the decompressed video can suffer severe visual degradation across multiple frames. In order to enhance the visual quality, we propose an inter-layer error concealment method using motion vector averaging and slice interleaving to deal with burst packet losses and error propagation. Experimental results show that the proposed error concealment methods outperform two existing methods

    Quasi-Bezier curves integrating localised information

    Get PDF
    Bezier curves (BC) have become fundamental tools in many challenging and varied applications, ranging from computer-aided geometric design to generic object shape descriptors. A major limitation of the classical Bezier curve, however, is that only global information about its control points (CP) is considered, so there can often be a large gap between the curve and its control polygon, leading to large distortion in shape representation. While strategies such as degree elevation, composite BC, refinement and subdivision reduce this gap, they also increase the number of CP and hence bit-rate, and computational complexity. This paper presents novel contributions to BC theory, with the introduction of quasi-Bezier curves (QBC), which seamlessly integrate localised CP information into the inherent global Bezier framework, with no increase in either the number of CP or order of computational complexity. QBC crucially retains the core properties of the classical BC, such as geometric continuity and affine invariance, and can be embedded into the vertex-based shape coding and shape descriptor framework to enhance rate-distortion performance. The performance of QBC has been empirically tested upon a number of natural and synthetically shaped objects, with both qualitative and quantitative results confirming its consistently superior approximation performance in comparison with both the classical BC and other established BC-based shape descriptor methods

    MPEG-4 natural video coding - An overview

    Get PDF
    This paper describes the MPEG-4 standard, as defined in ISO/IEC 14496-2. The MPEG-4 visual standard is developed to provide users a new level of interaction with visual contents. It provides technologies to view, access and manipulate objects rather than pixels, with great error robustness at a large range of bit-rates. Application areas range from digital television, streaming video, to mobile multimedia and games. The MPEG-4 natural video standard consists of a collection of tools that support these application areas. The standard provides tools for shape coding, motion estimation and compensation, texture coding, error resilience, sprite coding and scalability. Conformance points in the form of object types, profiles and levels, provide the basis for interoperability. Shape coding can be performed in binary mode, where the shape of each object is described by a binary mask, or in gray scale mode, where the shape is described in a form similar to an alpha channel, allowing transparency, and reducing aliasing. Motion compensation is block based, with appropriate modifications for object boundaries. The block size can be 16×16, or 8×8, with half pixel resolution. MPEG-4 also provides a mode for overlapped motion compensation. Texture coding is based in 8×8 DCT, with appropriate modifications for object boundary blocks. Coefficient prediction is possible to improve coding efficiency. Static textures can be encoded using a wavelet transform. Error resilience is provided by resynchronization markers, data partitioning, header extension codes, and reversible variable length codes. Scalability is provided for both spatial and temporal resolution enhancement. MPEG-4 provides scalability on an object basis, with the restriction that the object shape has to be rectangular. MPEG-4 conformance points are defined at the Simple Profile, the Core Profile, and the Main Profile. Simple Profile and Core Profiles address typical scene sizes of QCIF and CIF size, with bit-rates of 64, 128, 384 and 2 Mbit/s. Main Profile addresses a typical scene sizes of CIF, ITU-R 601 and HD, with bit-rates at 2, 15 and 38.4 Mbit/s

    Geometric distortion measurement for shape coding: a contemporary review

    Get PDF
    Geometric distortion measurement and the associated metrics involved are integral to the rate-distortion (RD) shape coding framework, with importantly the efficacy of the metrics being strongly influenced by the underlying measurement strategy. This has been the catalyst for many different techniques with this paper presenting a comprehensive review of geometric distortion measurement, the diverse metrics applied and their impact on shape coding. The respective performance of these measuring strategies is analysed from both a RD and complexity perspective, with a recent distortion measurement technique based on arc-length-parameterisation being comparatively evaluated. Some contemporary research challenges are also investigated, including schemes to effectively quantify shape deformation

    Dynamic Bezier curves for variable rate-distortion

    Get PDF
    Bezier curves (BC) are important tools in a wide range of diverse and challenging applications, from computer-aided design to generic object shape descriptors. A major constraint of the classical BC is that only global information concerning control points (CP) is considered, consequently there may be a sizeable gap between the BC and its control polygon (CtrlPoly), leading to a large distortion in shape representation. While BC variants like degree elevation, composite BC and refinement and subdivision narrow this gap, they increase the number of CP and thereby both the required bit-rate and computational complexity. In addition, while quasi-Bezier curves (QBC) close the gap without increasing the number of CP, they reduce the underlying distortion by only a fixed amount. This paper presents a novel contribution to BC theory, with the introduction of a dynamic Bezier curve (DBC) model, which embeds variable localised CP information into the inherently global Bezier framework, by strategically moving BC points towards the CtrlPoly. A shifting parameter (SP) is defined that enables curves lying within the region between the BC and CtrlPoly to be generated, with no commensurate increase in CP. DBC provides a flexible rate-distortion (RD) criterion for shape coding applications, with a theoretical model for determining the optimal SP value for any admissible distortion being formulated. Crucially DBC retains core properties of the classical BC, including the convex hull and affine invariance, and can be seamlessly integrated into both the vertex-based shape coding and shape descriptor frameworks to improve their RD performance. DBC has been empirically tested upon a number of natural and synthetically shaped objects, with qualitative and quantitative results confirming its consistently superior shape approximation performance, compared with the classical BC, QBC and other established BC-based shape descriptor techniques

    Depth-based Multi-View 3D Video Coding

    Get PDF
    corecore