Search CORE

351 research outputs found

Unsupervised mining of audiovisually consistent segments in videos with application to structure analysis

Author: Ben Mathieu
Gravier Guillaume
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceIn this paper, a multimodal event mining technique is proposed to discover repeating video segments exhibiting audio and visual consistency in a totally unsupervised manner. The mining strategy first exploits independent audio and visual cluster analysis to provide segments which are consistent in both their visual and audio modalities, thus likely corresponding to a unique underlying event. A subsequent modeling stage using discriminative models enables accurate detection of the underlying event throughout the video. Event mining is applied to unsupervised video structure analysis, using simple heuristics on occurrence patterns of the events discovered to select those relevant to the video structure. Results on TV programs ranging from news to talk shows and games, show that structurally relevant events are discovered with precisions ranging from 87% to 98% and recalls from 59% to 94%

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Hierarchical topic structuring: from dense segmentation to topically focused fragments via burst analysis

Author: Gravier Guillaume
Simon Anca
Sébillot Pascale
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienceTopic segmentation traditionally relies on lexical cohesion measured through word re-occurrences to output a dense segmen-tation, either linear or hierarchical. In this paper, a novel organization of the topical structure of textual content is proposed. Rather than searching for topic shifts to yield dense segmentation, we propose an algorithm to extract topically focused fragments organized in a hierarchical manner. This is achieved by leveraging the temporal distribution of word re-occurrences, searching for bursts, to skirt the limits imposed by a global counting of lexical re-occurrences within segments. Comparison to a reference dense segmentation on varied datasets indicates that we can achieve a better topic focus while retrieving all of the important aspects of a text

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Zero-resource audio-only spoken term detection based on a combination of template matching techniques

Author: Bimbot Frédéric
Gravier Guillaume
Muscariello Armando
Publication venue: HAL CCSD
Publication date: 27/08/2011
Field of study

spoken term detection, template matching, unsupervised learning, posterior featuresInternational audienceSpoken term detection is a well-known information retrieval task that seeks to extract contentful information from audio by locating occurrences of known query words of interest. This paper describes a zero-resource approach to such task based on pattern matching of spoken term queries at the acoustic level. The template matching module comprises the cascade of a segmental variant of dynamic time warping and a self-similarity matrix comparison to further improve robustness to speech variability. This solution notably differs from more traditional train and test methods that, while shown to be very accurate, rely upon the availability of large amounts of linguistic resources. We evaluate our framework on different parameterizations of the speech templates: raw MFCC features and Gaussian posteriorgrams, French and English phonetic posteriorgrams output by two different state of the art phoneme recognizers

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

L'adaptation thématique d'un modèle de langue fait-elle apparaître des mots thématiques?

Author: Gravier Guillaume
Lécorvé Gwénolé
Sébillot Pascale
Publication venue: HAL CCSD
Publication date: 25/05/2010
Field of study

International audienceWhereas topic-based adaptation of language models (LM) claims to increase the accuracy of topic-specific words within automatic speech recognition, this paper investigates why this wish is not always verified. After outlining the mechanisms of LM adaptation and automatic speech recognition, diagnosing elements are proposed along with solutions. In addition to a better accuracy on topic-specific words, results show better graph error rates and word error rates on a set of spoken documents with various topic

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Audio Event Detection in Movies using Multiple Audio Words and Contextual Bayesian Networks

Author: Demarty Claire-Hélène
Gravier Guillaume
Gros Patrick
Penet Cédric
Publication venue: HAL CCSD
Publication date: 17/06/2013
Field of study

International audienceThis article investigates a novel use of the well known audio words representations to detect specific audio events, namely gunshots and explosions, in order to get more robustness towards soundtrack variability in Hollywood movies. An audio stream is processed as a sequence of stationary segments. Each segment is described by one or several audio words obtained by applying product quantization to standard features. Such a representation using multiple audio words constructed via product quantisation is one of the novelties described in this work. Based on this representation, Bayesian networks are used to exploit the contextual information in order to detect audio events. Experiments are performed on a comprehensive set of 15 movies, made publicly available. Results are comparable to the state of the art results obtained on the same dataset but show increased robustness to decision thresholds, however limiting the range of possible operating points in some conditions. Late fusion provides a solution to this issue

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

The Model of Reading : Modelling principles, Definitions, Schema, Alignments

Author: Antonini Alessio
Gravier Guillaume
Ouvry-Vial Brigitte
Vignale François
Publication venue: HAL CCSD
Publication date: 01/01/2019
Field of study

READ-IT Model of Reading -V2Executive Summary This technical report introduces the data model developed to address the systematic collection and use of reading experiences in READ-IT project. The model of reading presented in this document is meant to inform the development of the READ-IT database and tools. This document describes the methodological approach and design principles adopted in the development of the model of reading. Furthermore, this technical report describes the content of the first version of the data model of the reading experience, including a preliminary analysis of the alignments between READ-IT model of reading with CIDOC-CRM, FRBRoo, FoaF and Schema.org

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Irisa MediaEval 2011 Spoken Web Search System

Author: Gravier Guillaume
Muscariello Armando
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

These working notes describe the main aspects of IRISA submission for the Spoken Web Search at the MediaEval 2011 campaign. We test a language-independent audio-only system based on a combination of template matching techniques. A brief overview of the main components of the architecture is followed by reporting on the evaluation on the development and test data provided by the organizers

INRIA a CCSD electronic archive server

Recommended from our members

The Model of Reading: Modelling principles, Definitions, Schema, Alignments

Author: Antonini Alessio
Brigitte Ouvry-Vial
Guillaume Gravier
Vignale François
Publication venue: READ-IT
Publication date: 01/01/2019
Field of study

This technical report introduces the data model developed to address the systematic collection and use of reading experiences in READ-IT project. The model of reading presented in this document is meant to inform the development of the READ-IT database and tools. This document describes the methodological approach and design principles adopted in the development of the model of reading. Furthermore, this technical report describes the content of the first version of the data model of the reading experience, including a preliminary analysis of the alignments between READ-IT model of reading with CIDOC-CRM, FRBRoo, FoaF and Schema.org

Open Research Online (The Open University)

De la détection d'évènements sonores violents par SVM dans les films

Author: Demarty Claire-Hélène
Gravier Guillaume
Gros Patrick
Penet Cédric
Publication venue: HAL CCSD
Publication date: 01/06/2011
Field of study

National audienceThis article studies the behaviour of a state-of-the-art support vector machine audio event detection approach, applied to violent event detection in movies. The events we are trying to detect are screams, gunshots, explosions. Contrary to others studies, we show that the state-of-theart approach does not lead to good results on this task. A study on the repartition of samples into subsets in a cross validation protocol helps explain those results and highlights a generalisation problem due to a polymorphism of considered classes. This polymorphism is demonstrated by the computation the divergence between the samples of the test database and the training database.Cet article étudie le comportement d'une approche classique, à l'état de l'art, pour la détection d'événements sonores par machines à vecteurs supports, appliquée à la détection d'événements violents dans les films. Les événements sonores considérés, liés à la présence de violence, sont les Cris, les Coups de feu et les Explosions. Nous montrons que, contrairement aux résultats d'autres études, l'approche état de l'art ne donne pas de bons résultats sur cette tâche. Une étude sur la répartition des échantillons en sous-ensembles dans un protocole de validation croisée permet d'expliquer ces résultats et met en évidence un problème de généralisation, dû au polymorphisme des classes considérées. Ce polymorphisme est démontré par un calcul de divergence entre les échantillons de la base de test et ceux de la base d'apprentissage

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Investigating domain-independent NLP techniques for precise target selection in video hyperlinking

Author: Gravier Guillaume
Guinaudeau Camille
Simon Anca-Roxana
Sébillot Pascale
Publication venue: HAL CCSD
Publication date: 01/09/2014
Field of study

International audienceAutomatic generation of hyperlinks in multimedia video data is a subject with growing interest, as demonstrated by recent work undergone in the framework of the Search and Hyperlinking task within the Mediaeval benchmark initiative. In this paper, we compare NLP-based strategies for precise target selection in video hyperlinking exploiting speech material, with the goal of providing hyperlinks from a specified anchor to help information retrieval. We experimentally compare two approaches enabling to select short portions of videos which are relevant and possibly complementary with respect to the anchor. The first approach exploits a bipartite graph relating utterances and words to find the most relevant utterances. The second one uses explicit topic segmentation, whether hierarchical or not, to select the target segments. Experimental results are reported on the Mediaeval 2013 Search and Hyperlinking dataset which consists of BBC videos, demonstrating the interest of hierarchical topic segmentation for precise target selection

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1