595 research outputs found

    MPEG-4 tools and applications: an overview

    Get PDF
    In this paper we present an overview of the software tools currently available for the creation and display of MPEG-4 content. We first describe tools for encoding raw video into MPEG-4 compliant bitstreams. We then describe how this content may be used to create a complete MPEG-4 scene containing both graphical and interactive elements in addition to the more usual video and audio elements. Clearly, MPEG-4 content cannot be viewed without appropriate decoders and players and these are addressed in the third section of this paper. Finally, we demonstrate how these tools may be combined together to create MPEG-4 applications by presenting the details of two sample applications we have developed

    Region-based segmentation of images using syntactic visual features

    Get PDF
    This paper presents a robust and efficient method for segmentation of images into large regions that reflect the real world objects present in the scene. We propose an extension to the well known Recursive Shortest Spanning Tree (RSST) algorithm based on a new color model and so-called syntactic features [1]. We introduce practical solutions, integrated within the RSST framework, to structure analysis based on the shape and spatial configuration of image regions. We demonstrate that syntactic features provide a reliable basis for region merging criteria which prevent formation of regions spanning more than one semantic object, thereby significantly improving the perceptual quality of the output segmentation. Experiments indicate that the proposed features are generic in nature and allow satisfactory segmentation of real world images from various sources without adjustment to algorithm parameters

    Dialogue scene detection in movies using low and mid-level visual features

    Get PDF
    This paper describes an approach for detecting dialogue scenes in movies. The approach uses automatically extracted low- and mid-level visual features that characterise the visual content of individual shots, and which are then combined using a state transition machine that models the shot-level temporal characteristics of the scene under investigation. The choice of visual features used is motivated by a consideration of formal film syntax. The system is designed so that the analysis may be applied in order to detect different types of scenes, although in this paper we focus on dialogue sequences as these are the most prevalent scenes in the movies considered to date

    Complexity adaptation in H.264/AVC video coder for static cameras

    Get PDF
    H.264/AVC uses variable block size motion estimation (VBSME) to improve coding gain. However, its complexity is significant and fixed regardless of the required quality or of the scene characteristics. In this paper, we propose an adaptive complexity algorithm based on using the Walsh Hadamard Transform (WHT). VBS automatic partition and skip mode detection algorithms also are proposed. Experimental results show that 70% - 5% of the computation of H.264/AVC is required to achieve the same PSNR

    Using dempster-shafer theory to fuse multiple information sources in region-based segmentation

    Get PDF
    This paper presents a new method for segmentation of images into large regions that reflect the real world objects present in a scene. It explores the feasibility of utilizing spatial configuration of regions and their geometric properties (the so-called Syntactic Visual Features [1]) for improving the correspondence of segmentation results produced by the well-known Recursive Shortest Spanning Tree (RSST) algorithm [2] to semantic objects present in the scene. The main contribution of this paper is a novel framework for integration of evidence from multiple sources with the region merging process based on the Dempster-Shafer (DS) theory [3] that allows integration of sources providing evidence with different accuracy and reliability. Extensive experiments indicate that the proposed solution limits formation of regions spanning more than one semantic object

    Fast intra prediction in the transform domain

    Get PDF
    In this paper, we present a fast intra prediction method based on separating the transformed coefficients. The prediction block can be obtained from the transformed and quantized neighboring block generating minimum distortion for each DC and AC coefficients independently. Two prediction methods are proposed, one is full block search prediction (FBSP) and the other is edge based distance prediction (EBDP), that find the best matched transformed coefficients on additional neighboring blocks. Experimental results show that the use of transform coefficients greatly enhances the efficiency of intra prediction whilst keeping complexity low compared to H.264/AVC

    Low computational complexity variable block size (VBS) partitioning for motion estimation using the Walsh Hadamard transform (WHT)

    Get PDF
    Variable Block Size (VBS) based motion estimation has been adapted in state of the art video coding, such as H.264/AVC, VC-1. However, a low complexity H.264/AVC encoder cannot take advantage of VBS due to its power consumption requirements. In this paper, we present a VBS partition algorithm based on a binary motion edge map without either initial motion estimation or Rate-Distortion (R-D) optimization for selecting modes. The proposed algorithm uses the Walsh Hadamard Transform (WHT) to create a binary edge map, which provides a computational complexity cost effectiveness compared to other light segmentation methods typically used to detect the required region

    Scalable virtual viewpoint image synthesis for multiple camera environments

    Get PDF
    One of the main aims of emerging audio-visual (AV) applications is to provide interactive navigation within a captured event or scene. This paper presents a view synthesis algorithm that provides a scalable and flexible approach to virtual viewpoint synthesis in multiple camera environments. The multi-view synthesis (MVS) process consists of four different phases that are described in detail: surface identification, surface selection, surface boundary blending and surface reconstruction. MVS view synthesis identifies and selects only the best quality surface areas from the set of available reference images, thereby reducing perceptual errors in virtual view reconstruction. The approach is camera setup independent and scalable as virtual views can be created given 1 to N of the available video inputs. Thus, MVS provides interactive AV applications with a means to handle scenarios where camera inputs increase or decrease over time

    Toward next generation coaching tools for court based racquet sports

    Get PDF
    Even with today’s advances in automatic indexing of multimedia content, existing coaching tools for court sports lack the ability to automatically index a competitive match into key events. This paper proposes an automatic event indexing and event retrieval system for tennis, which can be used to coach from beginners upwards. Event indexing is possible using either visual or inertial sensing, with the latter potentially providing system portability. To achieve maximum performance in event indexing, multi-sensor data integration is implemented, where data from both sensors is merged to automatically index key tennis events. A complete event retrieval system is also presented to allow coaches to build advanced queries which existing sports coaching solutions cannot facilitate without an inordinate amount of manual indexing

    An intuitive user interface for visual sports coaching

    Get PDF
    This paper describes a dynamic multi-video user interface for sports coaching. It is intended that sports coaches could use this split screen to minimise and maximise multiple video streams of an athlete on one side of the split screen, while playing an additional video source on the other side of the split screen, such as a clip from a professional athlete. This split screen approach allows users to contrast movements in the athletes videos, with that of a professional. Users can also avail of the ability to use video overlays, text input and can also use screen capture technology to record the application display, so that an athlete can review a coaching session at later date
    corecore