Search CORE

68 research outputs found

Using data-driven and phonetic units for speaker verification

Author: El Hannani Asmaa
Hennebert Jean
Montero-Asenjo Alberto
Petrovska-Delacrétaz Dijana
Toledano Doroteo T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. A. E. Hannani, D. T. Toledano, D. Petrovska-Delacrétaz, A. Montero-Asenjo, J. Hennebert, "Using Data-driven and Phonetic Units for Speaker Verification" in Odyssey: The Speaker and Language Recognition Workshop, San Juan (Puerto Rico), 2006, pp.1 - 6Recognition of speaker identity based on modeling the streams produced by phonetic decoders (phonetic speaker recognition) has gained popularity during the past few years. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. Data-driven segmentation techniques provide a potential solution to these problems because they do not use transcribed data and can easily be applied on development data minimizing the mismatches. In this paper we compare speaker recognition results using phonetic and data-driven decoders. To this end, we have compared the results obtained with a speaker recognition system based on data-driven acoustic units and phonetic speaker recognition systems trained on Spanish and English data. Results obtained on the NIST 2005 Speaker Recognition Evaluation data show that the data-driven approach outperforms the phonetic one and that further improvements can be achieved by combining both approache

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Using data-driven and phonetic units for speaker verication

Author: Alberto Montero-Asenjo
Asmaa El Hannani
Dijana Petrovska-Delacrétaz
Doroteo T Toledano
Jean Hennebert
Publication venue
Publication date: 01/01/2006
Field of study

Abstract Recognition of speaker identity based on modeling the streams produced by phonetic decoders (phonetic speaker recognition) has gained popularity during the past few years. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. Data-driven segmentation techniques provide a potential solution to these problems because they do not use transcribed data and can easily be applied on development data minimizing the mismatches. In this paper we compare speaker recognition results using phonetic and data-driven decoders. To this end, we have compared the results obtained with a speaker recognition system based on data-driven acoustic units and phonetic speaker recognition systems trained on Spanish and English data. Results obtained on the NIST 2005 Speaker Recognition Evaluation data show that the data-driven approach outperforms the phonetic one and that further improvements can be achieved by combining both approaches

CiteSeerX

Automatic detection of known advertisements in radio broadcast with data-driven ALISP transcriptions

Author: A Wang
C Neves
Dijana Petrovska-Delacrétaz
Gérard Chollet
Houssemeddine Khemiri
JM Makhoul
P Cano
SF Altschul
V Levenshtein
Y Linde
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Image-based Search and Retrieval for Biface Artefacts using Features Capturing Archaeologically Significant Characteristics

Author: A Chalechale
A Jain
A Vargha
AA Evans
AH Sayce
Andrew Lewis
BS Manjunath
C Zhu
Christopher Power
D Petrovska-Delacrétaz
Ekta Walia
G Odell
GS Hesse
H Beyer
J Arróspide
JA Barceló
L Grosman
L Zhang
Mark Eramian
MS Sarfraz
P Kovesi
Paul Cairns
PH Lewis
S Xie
SC Lin
W Andrefsky Jr
W Böhler
Z Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Archaeologists are currently producing huge numbers of digitized photographs to record and preserve artefact finds. These images are used to identify and categorize artefacts and reason about connections between artefacts and perform outreach to the public. However, finding specific types of images within collections remains a major challenge. Often, the metadata associated with images is sparse or is inconsistent. This makes keyword-based exploratory search difficult, leaving researchers to rely on serendipity and slowing down the research process. We present an image-based retrieval system that addresses this problem for biface artefacts. In order to identify artefact characteristics that need to be captured by image features, we conducted a contextual inquiry study with experts in bifaces. We then devised several descriptors for matching images of bifaces with similar artefacts. We evaluated the performance of these descriptors using measures that specifically look at the differences between the sets of images returned by the search system using different descriptors. Through this nuanced approach, we have provided a comprehensive analysis of the strengths and weaknesses of the different descriptors and identified implications for design in the search systems for archaeology

Crossref

Springer - Publisher Connector

White Rose Research Online

An audio-visual corpus for multimodal automatic speech recognition

Author: A Czyzewski
A Czyzewski
AG Chitu
Andrzej Czyzewski
Bozena Kostek
D Petrovska-Delacrétaz
D Stewart
E Trentin
H Lane
H Lane
H McGurk
Jozef Kotus
K Noda
M Cooke
Marcin Szykulski
P Zelasko
Piotr Bratoszewski
RS Bolia
S Pigeon
YW Wong
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

IRIM at TRECVID 2012: Semantic Indexing and Instance Search

International audienceThe IRIM group is a consortium of French teams work- ing on Multimedia Indexing and Retrieval. This paper describes its participation to the TRECVID 2012 se- mantic indexing and instance search tasks. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likeli- hood of a video shot to contain a target concept. These scores are then used for producing a ranked list of im- ages or shots that are the most likely to contain the tar- get concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classi cation, fusion of descriptor variants, higher-level fusion, and re-ranking. We evaluated a number of dif- ferent descriptors and tried di erent fusion strategies. The best IRIM run has a Mean Inferred Average Pre- cision of 0.2378, which ranked us 4th out of 16 partici- pants. For the instance search task, our approach uses two steps. First individual methods of participants are used to compute similrity between an example image of in- stance and keyframes of a video clip. Then a two-step fusion method is used to combine these individual re- sults and obtain a score for the likelihood of an instance to appear in a video clip. These scores are used to ob- tain a ranked list of clips the most likely to contain the queried instance. The best IRIM run has a MAP of 0.1192, which ranked us 29th on 79 fully automatic runs

HAL-CentraleSupelec

Hal - Université Grenoble Alpes

HAL AMU

INRIA a CCSD electronic archive server

HAL

HAL Université de Savoie

HAL-CEA

Hal-Diderot

HAL-Rennes 1