10,516 research outputs found

    Recent Advances in MPEG-7 Cameras

    Get PDF
    We propose a smart camera which performs video analysis and generates an MPEG-7 compliant stream. By producing a content-based metadata description of the scene, the MPEG-7 camera extends the capabilities of conventional cameras. The metadata is then directly interpretable by a machine. This is especially helpful in a number of applications such as video surveillance, augmented reality and quality control. As a use case, we describe an algorithm to identify moving objects and produce the corresponding MPEG-7 description. The algorithm runs in real-time on a Matrox Iris P300C camera

    Smart Camera for MPEG-7

    Get PDF
    While a first generation of video coding techniques proposed to remove the redundancies in and between image frames to get smaller bitstreams, second generation schemes like MPEG-4 and MPEG-7 aim at doing content-based coding and interactivity. To reach this goal, tools for the extraction and description of semantic objects need to be developed. In this work, we propose an algorithm for the extraction and tracking of semantic objects and an MPEG-7 compliant descriptor set for generic objects; together, they can be seen like a smart camera for automatic scene description. Some parts of the proposed system will be tested by software. The tracking algorithm has been laid out so as to follow generic objects in scenes including partial occlusions and merging. To do this, we first localize each moving object of the scene using a change-detection mask. Then, a certain number of representative points called centroids is given to the objects by a fuzzy C-means algorithm. For each centroid of some current frame, we try to find the closest centroid in the previous frame. Once we found these pairs, each object can be labelled according to its corresponding previous centroids. The description structure is a subset of the DDL language used in MPEG-7. The main concern was to find a simple, but flexible descriptor set for generic objects. A corresponding C-structure for software implementations is also proposed and partially tested

    Reliable camera motion estimation from compressed MPEG videos using machine learning approach

    Get PDF
    As an important feature in characterizing video content, camera motion has been widely applied in various multimedia and computer vision applications. A novel method for fast and reliable estimation of camera motion from MPEG videos is proposed, using support vector machine for estimation in a regression model trained on a synthesized sequence. Experiments conducted on real sequences show that the proposed method yields much improved results in estimating camera motions while the difficulty in selecting valid macroblocks and motion vectors is skipped

    Indexing, browsing and searching of digital video

    Get PDF
    Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver

    Keyframe detection in visual lifelogs

    Get PDF
    The SenseCam is a wearable camera that passively captures images. Therefore, it requires no conscious effort by a user in taking a photo. A Visual Diary from such a source could prove to be a valuable tool in assisting the elderly, individuals with neurodegenerative diseases, or other traumas. One issue with Visual Lifelogs is the large volume of image data generated. In previous work we spit a day's worth of images into more manageable segments, i.e. into distinct events or activities. However, each event coud stil consist of 80-100 images. thus, in this paper we propose a novel approach to selecting the key images within an event using a combination of MPEG-7 and Scale Invariant Feature Transform (SIFT) features

    The DICEMAN description schemes for still images and video sequences

    Get PDF
    To address the problem of visual content description, two Description Schemes (DSs) developed within the context of a European ACTS project known as DICEMAN, are presented. The DSs, designed based on an analogy with well-known tools for document description, describe both the structure and semantics of still images and video sequences. The overall structure of both DSs including the various sub-DSs and descriptors (Ds) of which they are composed is described. In each case, the hierarchical sub-DS for describing structure can be constructed using automatic (or semi-automatic) image/video analysis tools. The hierarchical sub-DSs for describing the semantics, however, are constructed by a user. The integration of the two DSs into a video indexing application currently under development in DICEMAN is also briefly described.Peer ReviewedPostprint (published version

    Semantic web technologies for video surveillance metadata

    Get PDF
    Video surveillance systems are growing in size and complexity. Such systems typically consist of integrated modules of different vendors to cope with the increasing demands on network and storage capacity, intelligent video analytics, picture quality, and enhanced visual interfaces. Within a surveillance system, relevant information (like technical details on the video sequences, or analysis results of the monitored environment) is described using metadata standards. However, different modules typically use different standards, resulting in metadata interoperability problems. In this paper, we introduce the application of Semantic Web Technologies to overcome such problems. We present a semantic, layered metadata model and integrate it within a video surveillance system. Besides dealing with the metadata interoperability problem, the advantages of using Semantic Web Technologies and the inherent rule support are shown. A practical use case scenario is presented to illustrate the benefits of our novel approach

    Combining textual and visual information processing for interactive video retrieval: SCHEMA's participation in TRECVID 2004

    Get PDF
    In this paper, the two different applications based on the Schema Reference System that were developed by the SCHEMA NoE for participation to the search task of TRECVID 2004 are illustrated. The first application, named ”Schema-Text”, is an interactive retrieval application that employs only textual information while the second one, named ”Schema-XM”, is an extension of the former, employing algorithms and methods for combining textual, visual and higher level information. Two runs for each application were submitted, I A 2 SCHEMA-Text 3, I A 2 SCHEMA-Text 4 for Schema-Text and I A 2 SCHEMA-XM 1, I A 2 SCHEMA-XM 2 for Schema-XM. The comparison of these two applications in terms of retrieval efficiency revealed that the combination of information from different data sources can provide higher efficiency for retrieval systems. Experimental testing additionally revealed that initially performing a text-based query and subsequently proceeding with visual similarity search using one of the returned relevant keyframes as an example image is a good scheme for combining visual and textual information
    corecore