10,948 research outputs found

    Event detection in field sports video using audio-visual features and a support vector machine

    Get PDF
    In this paper, we propose a novel audio-visual feature-based framework for event detection in broadcast video of multiple different field sports. Features indicating significant events are selected and robust detectors built. These features are rooted in characteristics common to all genres of field sports. The evidence gathered by the feature detectors is combined by means of a support vector machine, which infers the occurrence of an event based on a model generated during a training phase. The system is tested generically across multiple genres of field sports including soccer, rugby, hockey, and Gaelic football and the results suggest that high event retrieval and content rejection statistics are achievable

    Combining textual and visual information processing for interactive video retrieval: SCHEMA's participation in TRECVID 2004

    Get PDF
    In this paper, the two different applications based on the Schema Reference System that were developed by the SCHEMA NoE for participation to the search task of TRECVID 2004 are illustrated. The first application, named ”Schema-Text”, is an interactive retrieval application that employs only textual information while the second one, named ”Schema-XM”, is an extension of the former, employing algorithms and methods for combining textual, visual and higher level information. Two runs for each application were submitted, I A 2 SCHEMA-Text 3, I A 2 SCHEMA-Text 4 for Schema-Text and I A 2 SCHEMA-XM 1, I A 2 SCHEMA-XM 2 for Schema-XM. The comparison of these two applications in terms of retrieval efficiency revealed that the combination of information from different data sources can provide higher efficiency for retrieval systems. Experimental testing additionally revealed that initially performing a text-based query and subsequently proceeding with visual similarity search using one of the returned relevant keyframes as an example image is a good scheme for combining visual and textual information

    Indexing, browsing and searching of digital video

    Get PDF
    Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver

    The DICEMAN description schemes for still images and video sequences

    Get PDF
    To address the problem of visual content description, two Description Schemes (DSs) developed within the context of a European ACTS project known as DICEMAN, are presented. The DSs, designed based on an analogy with well-known tools for document description, describe both the structure and semantics of still images and video sequences. The overall structure of both DSs including the various sub-DSs and descriptors (Ds) of which they are composed is described. In each case, the hierarchical sub-DS for describing structure can be constructed using automatic (or semi-automatic) image/video analysis tools. The hierarchical sub-DSs for describing the semantics, however, are constructed by a user. The integration of the two DSs into a video indexing application currently under development in DICEMAN is also briefly described.Peer ReviewedPostprint (published version

    A framework for automatic semantic video annotation

    Get PDF
    The rapidly increasing quantity of publicly available videos has driven research into developing automatic tools for indexing, rating, searching and retrieval. Textual semantic representations, such as tagging, labelling and annotation, are often important factors in the process of indexing any video, because of their user-friendly way of representing the semantics appropriate for search and retrieval. Ideally, this annotation should be inspired by the human cognitive way of perceiving and of describing videos. The difference between the low-level visual contents and the corresponding human perception is referred to as the ‘semantic gap’. Tackling this gap is even harder in the case of unconstrained videos, mainly due to the lack of any previous information about the analyzed video on the one hand, and the huge amount of generic knowledge required on the other. This paper introduces a framework for the Automatic Semantic Annotation of unconstrained videos. The proposed framework utilizes two non-domain-specific layers: low-level visual similarity matching, and an annotation analysis that employs commonsense knowledgebases. Commonsense ontology is created by incorporating multiple-structured semantic relationships. Experiments and black-box tests are carried out on standard video databases for action recognition and video information retrieval. White-box tests examine the performance of the individual intermediate layers of the framework, and the evaluation of the results and the statistical analysis show that integrating visual similarity matching with commonsense semantic relationships provides an effective approach to automated video annotation

    A video object generation tool allowing friendly user interaction

    Get PDF
    In this paper we describe an interactive video object segmentation tool developed in the framework of the ACTS-AC098 MOMUSYS project. The Video Object Generator with User Environment (VOGUE) combines three different sets of automatic and semi-automatic-tool (spatial segmentation, object tracking and temporal segmentation) with general purpose tools for user interaction. The result is an integrated environment allowing the user-assisted segmentation of any sort of video sequences in a friendly and efficient manner.Peer ReviewedPostprint (published version

    Segmentation and tracking of video objects for a content-based video indexing context

    Get PDF
    This paper examines the problem of segmentation and tracking of video objects for content-based information retrieval. Segmentation and tracking of video objects plays an important role in index creation and user request definition steps. The object is initially selected using a semi-automatic approach. For this purpose, a user-based selection is required to define roughly the object to be tracked. In this paper, we propose two different methods to allow an accurate contour definition from the user selection. The first one is based on an active contour model which progressively refines the selection by fitting the natural edges of the object while the second used a binary partition tree with aPeer ReviewedPostprint (published version
    • 

    corecore