7 research outputs found

    CLIPS and NII at TRECvid: Shot segmentation and feature extraction

    No full text
    International audienceThis paper presents the systems used by CLIPS- IMAG laboratory. We participated to shot seg- mentation and high-level extraction tasks. We fo- cus this year on High-Level Features Extraction task, based on key frames classification. We pro- pose an original and promising framework for in- corporating contextual information (from image content) into the concept detection process. The proposed method combines local and global clas- sifiers with stacking, using SVM. We handle topo- logic and semantic contexts in concept detection performance and proposed solutions to handle the large amount of dimensions involved in classified data

    Using Topic Concepts for Semantic Video Shots Classification

    No full text
    International audienceAutomatic semantic classification of video databases is very useful for users searching and browsing but it is a very challenging re- search problem as well. Combination of visual and text modalities is one of the key issues to bridge the semantic gap between signal and semantic. In this paper, we propose to enhance the classification of high- level concepts using intermediate topic concepts and study various fu- sion strategies to combine topic concepts with visual features in order to outperform unimodal classifiers. We have conducted several experiments on the TRECVID'05 collection and show here that several intermediate topic classifiers can bridge parts of the semantic gap and help to detect high-level concepts

    Classifier Fusion for SVM-Based Multimedia Semantic Indexing

    Get PDF
    International audienceConcept indexing in multimedia libraries is very useful for users searching and browsing but it is a very challenging research problem as well. Combining several modalities, features or concepts is one of the key issues for bridging the gap between signal and semantics. In this pa- per, we present three fusion schemes inspired from the classical early and late fusion schemes. First, we present a kernel-based fusion scheme which takes advantage of the kernel basis of classifiers such as SVMs. Second, we integrate a new normalization process into the early fusion scheme. Third, we present a contextual late fusion scheme to merge classification scores of several concepts. We conducted experiments in the framework of the official TRECVID'06 evaluation campaign and we obtained signif- icant improvements with the proposed fusion schemes relatively to usual fusion schemes

    Classifier Fusion for SVM-Based Multimedia Semantic Indexing

    Get PDF
    International audienceConcept indexing in multimedia libraries is very useful for users searching and browsing but it is a very challenging research problem as well. Combining several modalities, features or concepts is one of the key issues for bridging the gap between signal and semantics. In this pa- per, we present three fusion schemes inspired from the classical early and late fusion schemes. First, we present a kernel-based fusion scheme which takes advantage of the kernel basis of classifiers such as SVMs. Second, we integrate a new normalization process into the early fusion scheme. Third, we present a contextual late fusion scheme to merge classification scores of several concepts. We conducted experiments in the framework of the official TRECVID'06 evaluation campaign and we obtained signif- icant improvements with the proposed fusion schemes relatively to usual fusion schemes

    Semantics of video shots for content-based retrieval

    Get PDF
    Content-based video retrieval research combines expertise from many different areas, such as signal processing, machine learning, pattern recognition, and computer vision. As video extends into both the spatial and the temporal domain, we require techniques for the temporal decomposition of footage so that specific content can be accessed. This content may then be semantically classified - ideally in an automated process - to enable filtering, browsing, and searching. An important aspect that must be considered is that pictorial representation of information may be interpreted differently by individual users because it is less specific than its textual representation. In this thesis, we address several fundamental issues of content-based video retrieval for effective handling of digital footage. Temporal segmentation, the common first step in handling digital video, is the decomposition of video streams into smaller, semantically coherent entities. This is usually performed by detecting the transitions that separate single camera takes. While abrupt transitions - cuts - can be detected relatively well with existing techniques, effective detection of gradual transitions remains difficult. We present our approach to temporal video segmentation, proposing a novel algorithm that evaluates sets of frames using a relatively simple histogram feature. Our technique has been shown to range among the best existing shot segmentation algorithms in large-scale evaluations. The next step is semantic classification of each video segment to generate an index for content-based retrieval in video databases. Machine learning techniques can be applied effectively to classify video content. However, these techniques require manually classified examples for training before automatic classification of unseen content can be carried out. Manually classifying training examples is not trivial because of the implied ambiguity of visual content. We propose an unsupervised learning approach based on latent class modelling in which we obtain multiple judgements per video shot and model the users' response behaviour over a large collection of shots. This technique yields a more generic classification of the visual content. Moreover, it enables the quality assessment of the classification, and maximises the number of training examples by resolving disagreement. We apply this approach to data from a large-scale, collaborative annotation effort and present ways to improve the effectiveness for manual annotation of visual content by better design and specification of the process. Automatic speech recognition techniques along with semantic classification of video content can be used to implement video search using textual queries. This requires the application of text search techniques to video and the combination of different information sources. We explore several text-based query expansion techniques for speech-based video retrieval, and propose a fusion method to improve overall effectiveness. To combine both text and visual search approaches, we explore a fusion technique that combines spoken information and visual information using semantic keywords automatically assigned to the footage based on the visual content. The techniques that we propose help to facilitate effective content-based video retrieval and highlight the importance of considering different user interpretations of visual content. This allows better understanding of video content and a more holistic approach to multimedia retrieval in the future

    Author manuscript, published in "TREC Workshop on Video Retrieval Evaluation (2005)" CLIPS-LSR-NII Experiments at TRECVID 2005

    No full text
    This paper presents the systems used by CLIPS-IMAG laboratory. We participated to shot segmentation and high-level extraction tasks. We focus this year on High-Level Features Extraction task, based on key frames classification. We propose an original and promising framework for incorporating contextual information (from image content) into the concept detection process. The proposed method combines local and global classifiers with stacking, using SVM. We handle topologic and semantic contexts in concept detection performance and proposed solutions to handle the large amount of dimensions involved in classified data.
    corecore