5 research outputs found

    Unsupervised mining of audiovisually consistent segments in videos with application to structure analysis

    Get PDF
    International audienceIn this paper, a multimodal event mining technique is proposed to discover repeating video segments exhibiting audio and visual consistency in a totally unsupervised manner. The mining strategy first exploits independent audio and visual cluster analysis to provide segments which are consistent in both their visual and audio modalities, thus likely corresponding to a unique underlying event. A subsequent modeling stage using discriminative models enables accurate detection of the underlying event throughout the video. Event mining is applied to unsupervised video structure analysis, using simple heuristics on occurrence patterns of the events discovered to select those relevant to the video structure. Results on TV programs ranging from news to talk shows and games, show that structurally relevant events are discovered with precisions ranging from 87% to 98% and recalls from 59% to 94%

    Content-based discovery of multiple structures from episodes of recurrent TV programs based on grammatical inference

    Get PDF
    International audienceTV program structuring is essential for program indexing and retrieval. Practically, various types of programs lead to a diversity of program structures. In addition, several episodes of a recurrent program might exhibit different structures. Previous work mostly relies on supervised approaches by adopting prior knowledge about program structures. In this paper, we address the problem of unsupervised program structuring with minimal prior knowledge about the programs. We propose an approach to identify multiple structures and infer structural grammars for recurrent TV programs of different types. It involves three sub-problems: i) we determine the structural elements contained in programs with minimal knowledge about which type of elements may be present; ii) we identify multiple structures for the programs if any and model the structures of programs; iii) we generate the structural grammar for each corresponding structure. Finally, we conduct use cases on real recurrent programs of three different types to demonstrate the effectiveness of proposed approach

    Inférence de la grammaire structurelle d’une émission TV récurrente à partir du contenu

    Get PDF
    TV program structuring raises as a major theme in last decade for the task of high quality indexing. In this thesis, we address the problem of unsupervised TV program structuring from the point of view of grammatical inference, i.e., discovering a common structural model shared by a collection of episodes of a recurrent program. Using grammatical inference makes it possible to rely on only minimal domain knowledge. In particular, we assume no prior knowledge on the structural elements that might be present in a recurrent program and very limited knowledge on the program type, e.g., to name structural elements, apart from the recurrence. With this assumption, we propose an unsupervised framework operating in two stages. The first stage aims at determining the structural elements that are relevant to the structure of a program. We address this issue making use of the property of element repetitiveness in recurrent programs, leveraging temporal density analysis to filter out irrelevant events and determine valid elements. Having discovered structural elements, the second stage is to infer a grammar of the program. We explore two inference techniques based either on multiple sequence alignment or on uniform resampling. A model of the structure is derived from the grammars and used to predict the structure of new episodes. Evaluations are performed on a selection of four different types of recurrent programs. Focusing on structural element determination, we analyze the effect on the number of determined structural elements, fixing the threshold applied on the density function as well as the size of collection of episodes. For structural grammar inference, we discuss the quality of the grammars obtained and show that they accurately reflect the structure of the program. We also demonstrate that the models obtained by grammatical inference can accurately predict the structure of unseen episodes, conducting a quantitative and comparative evaluation of the two methods by segmenting the new episodes into their structural components. Finally, considering the limitations of our work, we discuss a number of open issues in structure discovery and propose three new research directions to address in future work.Dans cette thèse, on aborde le problème de structuration des programmes télévisés de manière non supervisée à partir du point de vue de l'inférence grammaticale, focalisant sur la découverte de la structure des programmes récurrents à partir une collection homogène. On vise à découvrir les éléments structuraux qui sont pertinents à la structure du programme, et à l’inférence grammaticale de la structure des programmes. Des expérimentations montrent que l'inférence grammaticale permet de utiliser minimum des connaissances de domaine a priori pour atteindre la découverte de la structure des programmes

    Unsupervised mining of multiple audiovisually consistent clusters for video structure analysis

    Get PDF
    International audienceWe address the problem of detecting multiple audiovisual events related to the edit structure of a video by incorporating an unsupervised cluster analysis technique into a cluster selection method designed to measure coherence between audio and visual segments. First, mutual information measure is used to select audio-visually consistent clusters from two dendrograms representing hierarchical clustering results respectively for the audio and visual modalities. A cluster analysis technique is then applied to define events from the audio-visual (AV) clusters with segments co-occurring frequently. Candidate events are then characterized by groups of AV clusters from which models are built by automatically selecting positive and negative examples. Experiments on the standard Canal9 data set demonstrates that our method is capable of discovering multiple audiovisual events in a totally unsupervised manner
    corecore