10,516 research outputs found
Recent Advances in MPEG-7 Cameras
We propose a smart camera which performs video analysis and generates an MPEG-7 compliant stream. By producing a content-based metadata description of the scene, the MPEG-7 camera extends the capabilities of conventional cameras. The metadata is then directly interpretable by a machine. This is especially helpful in a number of applications such as video surveillance, augmented reality and quality control. As a use case, we describe an algorithm to identify moving objects and produce the corresponding MPEG-7 description. The algorithm runs in real-time on a Matrox Iris P300C camera
Smart Camera for MPEG-7
While a first generation of video coding techniques proposed to remove the redundancies in and between image frames to get smaller bitstreams, second generation schemes like MPEG-4 and MPEG-7 aim at doing content-based coding and interactivity. To reach this goal, tools for the extraction and description of semantic objects need to be developed. In this work, we propose an algorithm for the extraction and tracking of semantic objects and an MPEG-7 compliant descriptor set for generic objects; together, they can be seen like a smart camera for automatic scene description. Some parts of the proposed system will be tested by software. The tracking algorithm has been laid out so as to follow generic objects in scenes including partial occlusions and merging. To do this, we first localize each moving object of the scene using a change-detection mask. Then, a certain number of representative points called centroids is given to the objects by a fuzzy C-means algorithm. For each centroid of some current frame, we try to find the closest centroid in the previous frame. Once we found these pairs, each object can be labelled according to its corresponding previous centroids. The description structure is a subset of the DDL language used in MPEG-7. The main concern was to find a simple, but flexible descriptor set for generic objects. A corresponding C-structure for software implementations is also proposed and partially tested
Reliable camera motion estimation from compressed MPEG videos using machine learning approach
As an important feature in characterizing video content, camera motion has been widely applied in various multimedia and computer vision applications. A novel method for fast and reliable estimation of camera motion from MPEG videos is proposed, using support vector machine for estimation in a regression model trained on a synthesized sequence. Experiments conducted on real sequences show that the proposed method yields much improved results in estimating camera motions while the difficulty in selecting valid macroblocks and motion vectors is skipped
Indexing, browsing and searching of digital video
Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver
Keyframe detection in visual lifelogs
The SenseCam is a wearable camera that passively captures images. Therefore, it requires no conscious effort by a user in taking a photo. A Visual Diary from such a source could prove to be a valuable tool in assisting the elderly, individuals with neurodegenerative diseases, or other traumas. One issue with Visual Lifelogs is the large volume of image data generated. In previous work we spit a day's worth of images into more manageable segments, i.e. into distinct events or activities. However, each event coud stil consist of 80-100 images. thus, in this paper we propose a novel approach to selecting the key images within an event using a combination of MPEG-7 and Scale Invariant Feature Transform (SIFT) features
The DICEMAN description schemes for still images and video sequences
To address the problem of visual content description, two Description Schemes (DSs) developed within the context of a European ACTS project known as DICEMAN, are presented. The DSs, designed based on an analogy with well-known tools for document description, describe both the structure and semantics of still images and video
sequences. The overall structure of both DSs including the various sub-DSs and descriptors (Ds) of which they are composed is described. In each case, the hierarchical sub-DS for describing structure can be constructed using
automatic (or semi-automatic) image/video analysis tools. The hierarchical sub-DSs for describing the semantics, however, are constructed by a user. The integration of the two DSs into a video indexing application currently
under development in DICEMAN is also briefly described.Peer ReviewedPostprint (published version
Semantic web technologies for video surveillance metadata
Video surveillance systems are growing in size and complexity. Such systems typically consist of integrated modules of different vendors to cope with the increasing demands on network and storage capacity, intelligent video analytics, picture quality, and enhanced visual interfaces. Within a surveillance system, relevant information (like technical details on the video sequences, or analysis results of the monitored environment) is described using metadata standards. However, different modules typically use different standards, resulting in metadata interoperability problems. In this paper, we introduce the application of Semantic Web Technologies to overcome such problems. We present a semantic, layered metadata model and integrate it within a video surveillance system. Besides dealing with the metadata interoperability problem, the advantages of using Semantic Web Technologies and the inherent rule support are shown. A practical use case scenario is presented to illustrate the benefits of our novel approach
Combining textual and visual information processing for interactive video retrieval: SCHEMA's participation in TRECVID 2004
In this paper, the two different applications based on the Schema Reference System that were developed by the SCHEMA NoE for participation to the search task of TRECVID 2004 are illustrated. The first application, named ”Schema-Text”, is an interactive retrieval application that employs only textual information while the second one, named ”Schema-XM”, is an extension of the former, employing algorithms and
methods for combining textual, visual and higher level information. Two runs for each application were submitted, I A 2 SCHEMA-Text 3, I A 2 SCHEMA-Text 4 for Schema-Text and I A 2 SCHEMA-XM 1, I A 2 SCHEMA-XM 2 for Schema-XM. The comparison of these two applications in terms of retrieval efficiency revealed that the combination of information from different data sources can provide higher efficiency for retrieval systems. Experimental testing additionally revealed that initially performing a text-based query and subsequently proceeding with visual similarity search using one of the returned relevant keyframes as an example image is a good scheme for combining visual and textual information
Recommended from our members
Multimedia broadcast and internet satellite system design and user trial results
The EU funded project, System for Advanced Multimedia Broadcast
and IT Services (SAMBITS), has created an enhanced and synchronised,
multimedia terminal for merging satellite broadcast and internet
telecommunication services in a way that efficiently combines the large
bandwidth of the broadcast channel and the interactivity of the internet.
This paper proposes a novel broadcast and internet service concept, illustrates
this concept with two service scenarios and develops a system architecture to
demonstrate the range of key benefits provided by these new technologies.
It then describes the interactive multimedia terminal that was used for
consuming this new service concept. Finally, the results of the user trials on the
terminal are presented and discussed
- …