Search CORE

163 research outputs found

Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast

Author: Barras Claude
Besacier Laurent
Bredin Hervé
Le Viet-Bac
Poignant Johann
Quénot Georges
Publication venue: HAL CCSD
Publication date: 09/09/2012
Field of study

Poster Session: Speaker Recognition IIIInternational audienceWe propose an approach for unsupervised speaker identification in TV broadcast videos, by combining acoustic speaker diarization with person names obtained via video OCR from overlaid texts. Three methods for the propagation of the overlaid names to the speech turns are compared, taking into account the co-occurence duration between the speaker clusters and the names provided by the video OCR and using a task-adapted variant of the TF-IDF information retrieval coefficient. These methods were tested on the REPERE dry-run evaluation corpus, containing 3 hours of annotated videos. Our best unsupervised system reaches a F-measure of 70.2% when considering all the speakers, and 81.7% if anchor speakers are left out. By comparison, a mono-modal, supervised speaker identification system with 535 speaker models trained on matching development data and additional TV and radio data only provided a 57.5% F-measure when considering all the speakers and 45.7% without anchor

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Unsupervised Speaker Identification in TV Broadcast Based on Written Names

Author: Besacier Laurent
Poignant Johann
Quénot Georges
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/01/2015
Field of study

International audienceIdentifying speakers in TV broadcast in an unsuper- vised way (i.e. without biometric models) is a solution for avoiding costly annotations. Existing methods usually use pronounced names, as a source of names, for identifying speech clusters provided by a diarization step but this source is too imprecise for having sufficient confidence. To overcome this issue, another source of names can be used: the names written in a title block in the image track. We first compared these two sources of names on their abilities to provide the name of the speakers in TV broadcast. This study shows that it is more interesting to use written names for their high precision for identifying the current speaker. We also propose two approaches for finding speaker identity based only on names written in the image track. With the "late naming" approach, we propose different propagations of written names onto clusters. Our second proposition, "Early naming", modifies the speaker diarization module (agglomerative clustering) by adding constraints preventing two clusters with different associated written names to be merged together. These methods were tested on the REPERE corpus phase 1, containing 3 hours of annotated videos. Our best "late naming" system reaches an F-measure of 73.1%. "early naming" improves over this result both in terms of identification error rate and of stability of the clustering stopping criterion. By comparison, a mono-modal, supervised speaker identification system with 535 speaker models trained on matching development data and additional TV and radio data only provided a 57.2% F-measure

Hal - Université Grenoble Alpes

Towards a better integration of written names for unsupervised speakers identification in videos

Author: Barras Claude
Besacier Laurent
Bredin Hervé
Poignant Johann
Quénot Georges
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceExisting methods for unsupervised identification of speakers in TV broadcast usually rely on the output of a speaker diariza- tion module and try to name each cluster using names provided by another source of information: we call it "late naming". Hence, written names extracted from title blocks tend to lead to high precision identification, although they cannot correct er- rors made during the clustering step. In this paper, we extend our previous "late naming" ap- proach in two ways: "integrated naming" and "early naming". While "late naming" relies on a speaker diarization module op- timized for speaker diarization, "integrated naming" jointly op- timize speaker diarization and name propagation in terms of identification errors. "Early naming" modifies the speaker di- arization module by adding constraints preventing two clusters with different written names to be merged together. While "integrated naming" yields similar identification per- formance as "late naming" (with better precision), "early nam- ing" improves over this baseline both in terms of identification error rate and stability of the clustering stopping criterion

Hal - Université Grenoble Alpes

Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both ?

Author: Besacier Laurent
Le Viet Bac
Poignant Johann
Quénot Georges
Rosset Sophie
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audiencePersons identification in video from TV broadcast is a valuable tool for indexing them. However, the use of biometric mod- els is not a very sustainable option without a priori knowledge of people present in the videos. The pronounced names (PN) or written names (WN) on the screen can provide hypotheses names for speakers. We propose an experimental comparison of the potential of these two modalities (names pronounced or written) to extract the true names of the speakers. The names pronounced offer many instances of citation but transcription and named-entity detection errors halved the potential of this modality. On the contrary, the written names detection benefits of the video quality improvement and is nowadays rather robust and efficient to name speakers. Oracle experiments presented for the mapping between written names and speakers also show the complementarity of both PN and WN modalities

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

QCompere @ REPERE 2013

Author: Barras Claude
Besacier Laurent
Bredin Hervé
Ekenel Hazim Kemal
Fortier Guillaume
Hua Gao
Le Viet-Bac
Mignon Alexis
Poignant Johann
Quénot Georges
Rosset Sophie
Roy Anindya
Sarkar Achintya
Stiefelhagen Rainer
Tapaswi Makarand
Verbeek Jakob
Yang Qian
Publication venue: HAL CCSD
Publication date: 22/08/2013
Field of study

International audienceWe describe QCompere consortium submissions to the REPERE 2013 evaluation campaign. The REPERE challenge aims at gathering four communities (face recognition, speaker identification, optical character recognition and named entity detection) towards the same goal: multimodal person recognition in TV broadcast. First, four mono-modal components are introduced (one for each foregoing community) constituting the elementary building blocks of our various submissions. Then, depending on the target modality (speaker or face recognition) and on the task (supervised or unsupervised recognition), four different fusion techniques are introduced: they can be summarized as propagation-, classifier-, rule- or graph-based approaches. Finally, their performance is evaluated on REPERE 2013 test set and their advantages and limitations are discussed

Hal - Université Grenoble Alpes

Collaborative Annotation for Person Identification in TV Shows

Author: Barras Claude
Besacier Laurent
Bredin Hervé
Bruneau Pierrick
Budnik Matheuz
Poignant Johann
Stefas Mickael
Tamisier Thomas
Publication venue: HAL CCSD
Publication date: 06/09/2015
Field of study

International audienceThis paper presents a collaborative annotation framework for person identification in TV shows. The web annotation front-end will be demonstrated during the Show and Tell session. All the code for annotation is made available on github. The tool can also be used in a crowd-sourcing environment

Hal - Université Grenoble Alpes

Towards a better integration of written names for unsupervised speakers identification in videos

Author: Barras Claude
Besacier Laurent
Bredin Hervé
Poignant Johann
Quénot Georges
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

QCompere @ REPERE 2013

Author: Barras Claude
Besacier Laurent
Bredin Hervé
Ekenel Hazim Kemal
Fortier Guillaume
Hua Gao
Le Viet-Bac
Mignon Alexis
Poignant Johann
Quénot Georges
Rosset Sophie
Roy Anindya
Sarkar Achintya
Stiefelhagen Rainer
Tapaswi Makarand
Verbeek Jakob
Yang Qian
Publication venue: HAL CCSD
Publication date: 22/08/2013
Field of study

HAL - Normandie Université

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

CRF-Based Context Modeling for Person Identification in Broadcast Videos

Author: Deleglise Paul
Gay Paul
Meignier Sylvain
Odobez Jean-Marc
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2016
Field of study

International audienceno abstrac

Infoscience - École polytechnique fédérale de Lausanne

Frontiers - Publisher Connector

LIG at MediaEval 2015 Multimodal Person Discovery in Broadcast TV Task

Author: Ali Khodabakhsh
Bahjat Safadi
Cenk Demiroglu
Georges Quénot
Laurent Besacier
Mateusz Budnik
Publication venue
Publication date: 02/04/2020
Field of study

ABSTRACT In this working notes paper the contribution of the LIG team (partnership between Univ. Grenoble Alpes and Ozyegin University) to the Multimodal Person Discovery in Broadcast TV task in MediaEval 2015 is presented. The task focused on unsupervised learning techniques. Two different approaches were submitted by the team. In the first one, new features for face and speech modalities were tested. In the second one, an alternative way to calculate the distance between face tracks and speech segments is presented. It also had a competitive MAP score and was able to beat the baseline

CiteSeerX