813 research outputs found

    MASCOT : metadata for advanced scalable video coding tools : final report

    Get PDF
    The goal of the MASCOT project was to develop new video coding schemes and tools that provide both an increased coding efficiency as well as extended scalability features compared to technology that was available at the beginning of the project. Towards that goal the following tools would be used: - metadata-based coding tools; - new spatiotemporal decompositions; - new prediction schemes. Although the initial goal was to develop one single codec architecture that was able to combine all new coding tools that were foreseen when the project was formulated, it became clear that this would limit the selection of the new tools. Therefore the consortium decided to develop two codec frameworks within the project, a standard hybrid DCT-based codec and a 3D wavelet-based codec, which together are able to accommodate all tools developed during the course of the project

    COSMOS-7: Video-oriented MPEG-7 scheme for modelling and filtering of semantic content

    Get PDF
    MPEG-7 prescribes a format for semantic content models for multimedia to ensure interoperability across a multitude of platforms and application domains. However, the standard leaves it open as to how the models should be used and how their content should be filtered. Filtering is a technique used to retrieve only content relevant to user requirements, thereby reducing the necessary content-sifting effort of the user. This paper proposes an MPEG-7 scheme that can be deployed for semantic content modelling and filtering of digital video. The proposed scheme, COSMOS-7, produces rich and multi-faceted semantic content models and supports a content-based filtering approach that only analyses content relating directly to the preferred content requirements of the user

    Video summarisation: A conceptual framework and survey of the state of the art

    Get PDF
    This is the post-print (final draft post-refereeing) version of the article. Copyright @ 2007 Elsevier Inc.Video summaries provide condensed and succinct representations of the content of a video stream through a combination of still images, video segments, graphical representations and textual descriptors. This paper presents a conceptual framework for video summarisation derived from the research literature and used as a means for surveying the research literature. The framework distinguishes between video summarisation techniques (the methods used to process content from a source video stream to achieve a summarisation of that stream) and video summaries (outputs of video summarisation techniques). Video summarisation techniques are considered within three broad categories: internal (analyse information sourced directly from the video stream), external (analyse information not sourced directly from the video stream) and hybrid (analyse a combination of internal and external information). Video summaries are considered as a function of the type of content they are derived from (object, event, perception or feature based) and the functionality offered to the user for their consumption (interactive or static, personalised or generic). It is argued that video summarisation would benefit from greater incorporation of external information, particularly user based information that is unobtrusively sourced, in order to overcome longstanding challenges such as the semantic gap and providing video summaries that have greater relevance to individual users

    Audiovisual head orientation estimation with particle filtering in multisensor scenarios

    Get PDF
    This article presents a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones, such as SmartRooms or automatic video conferencing. Determining the individuals head orientation is the basis for many forms of more sophisticated interactions between humans and technical devices and can also be used for automatic sensor selection (camera, microphone) in communications or video surveillance systems. The use of particle filters as a unified framework for the estimation of the head orientation for both monomodal and multimodal cases is proposed. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Furthermore, two different particle filter multimodal information fusion schemes for combining the audio and video streams are analyzed in terms of accuracy and robustness. In the first one, fusion is performed at a decision level by combining each monomodal head pose estimation, while the second one uses a joint estimation system combining information at data level. Experimental results conducted over the CLEAR 2006 evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.Postprint (published version

    No-reference depth map quality evaluation model based on depth map edge confidence measurement in immersive video applications

    Get PDF
    When it comes to evaluating perceptual quality of digital media for overall quality of experience assessment in immersive video applications, typically two main approaches stand out: Subjective and objective quality evaluation. On one hand, subjective quality evaluation offers the best representation of perceived video quality assessed by the real viewers. On the other hand, it consumes a significant amount of time and effort, due to the involvement of real users with lengthy and laborious assessment procedures. Thus, it is essential that an objective quality evaluation model is developed. The speed-up advantage offered by an objective quality evaluation model, which can predict the quality of rendered virtual views based on the depth maps used in the rendering process, allows for faster quality assessments for immersive video applications. This is particularly important given the lack of a suitable reference or ground truth for comparing the available depth maps, especially when live content services are offered in those applications. This paper presents a no-reference depth map quality evaluation model based on a proposed depth map edge confidence measurement technique to assist with accurately estimating the quality of rendered (virtual) views in immersive multi-view video content. The model is applied for depth image-based rendering in multi-view video format, providing comparable evaluation results to those existing in the literature, and often exceeding their performance

    An MPEG-7 scheme for semantic content modelling and filtering of digital video

    Get PDF
    Abstract Part 5 of the MPEG-7 standard specifies Multimedia Description Schemes (MDS); that is, the format multimedia content models should conform to in order to ensure interoperability across multiple platforms and applications. However, the standard does not specify how the content or the associated model may be filtered. This paper proposes an MPEG-7 scheme which can be deployed for digital video content modelling and filtering. The proposed scheme, COSMOS-7, produces rich and multi-faceted semantic content models and supports a content-based filtering approach that only analyses content relating directly to the preferred content requirements of the user. We present details of the scheme, front-end systems used for content modelling and filtering and experiences with a number of users

    A Study on the Usage of Cross-Layer Power Control and Forward Error Correction for Embedded Video Transmission over Wireless Links

    Get PDF
    Cross-layering is a design paradigm for overcoming the limitations deriving from the ISO/OSI layering principle, thus improving the performance of communications in specific scenarios, such as wireless multimedia communications. However, most available solutions are based on empirical considerations, and do not provide a theoretical background supporting such approaches. The paper aims at providing an analytical framework for the study of single-hop video delivery over a wireless link, enabling cross-layer interactions for performance optimization using power control and FEC and providing a useful tool to determine the potential gain deriving from the employment of such design paradigm. The analysis is performed using rate-distortion information of an embedded video bitstream jointly with a Lagrangian power minimization approach. Simulation results underline that cross-layering can provide relevant improvement in specific environments and that the proposed approach is able to capitalize on the advantage deriving from its deployment

    A Novel Block-based Watermarking Scheme Using the SVD Transform

    Get PDF
    In this paper, a block-based watermarking scheme based on the Singular Value Decomposition (SVD) is proposed. Our watermark, a pseudo-random Gaussian sequence, is embedded by modifying the angles formed by the right singular vectors of each block of the original image. The orthogonality property of the right singular vector matrix is preserved during the embedding process. Several experiments have been carried out to test the performance of the proposed scheme against different attack scenarios. We conclude that the proposed scheme is resistant against common signal processing operations and attacks, while it preserves the quality of the original image

    Rate-distortion optimization for stereoscopic video streaming with unequal error protection

    Get PDF
    We consider an error-resilient stereoscopic streaming system that uses an H.264-based multiview video codec and a rateless Raptor code for recovery from packet losses. One aim of the present work is to suggest a heuristic methodology for modeling the end-to-end rate-distortion (RD) characteristic of such a system. Another aim is to show how to make use of such a model to optimally select the parameters of the video codec and the Raptor code to minimize the overall distortion. Specifically, the proposed system models the RD curve of video encoder and performance of channel codec to jointly derive the optimal encoder bit rates and unequal error protection (UEP) rates specific to the layered stereoscopic video streaming. We define analytical RD curve modeling for each layer that includes the interdependency of these layers. A heuristic analytical model of the performance of Raptor codes is also defined. Furthermore, the distortion on the stereoscopic video quality caused by packet losses is estimated. Finally, analytical models and estimated single-packet loss distortions are used to minimize the end-to-end distortion and to obtain optimal encoder bit rates and UEP rates. The simulation results clearly demonstrate the significant quality gain against the nonoptimized schemes
    corecore