Search CORE

11 research outputs found

Detecting replay attacks in audiovisual identity verification

Author: Bredin Herve
Miguel Antonio
Witten Ian H.
Chollet Gerard
Publication venue: IEEE Computer Society
Publication date: 01/01/2006
Field of study

We describe an algorithm that detects a lack of correspondence between speech and lip motion by detecting and monitoring the degree of synchrony between live audio and visual signals. It is simple, effective, and computationally inexpensive; providing a useful degree of robustness against basic replay attacks and against speech or image forgeries. The method is based on a cross-correlation analysis between two streams of features, one from the audio signal and the other from the image sequence. We argue that such an algorithm forms an effective first barrier against several kinds of replay attack that would defeat existing verification systems based on standard multimodal fusion techniques. In order to provide an evaluation mechanism for the new technique we have augmented the protocols that accompany the BANCA multimedia corpus by defining new scenarios. We obtain 0% equal-error rate (EER) on the simplest scenario and 35% on a more challenging one

Crossref

Research Commons@Waikato

Oxford University Research Archive

Detecting replay attacks in audiovisual identity verification

Author: Bredin Herve
Chollet Gerard
Miguel Antonio
Witten Ian H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

CiteSeerX

Research Commons@Waikato

The CAMOMILE collaborative annotation platform for multi-modal, multi-lingual and multi-media documents

Author: Adda Gilles
Barras Claude
Bredin Herve
Budnik Mateusz
Hernando Pericás Francisco Javier
Mariani Joseph
Morros Rubió Josep Ramon
Poignant Johann
Publication venue: European Language Resources Association
Publication date: 01/01/2016
Field of study

In this paper, we describe the organization and the implementation of the CAMOMILE collaborative annotation framework for multimodal, multimedia, multilingual (3M) data. Given the versatile nature of the analysis which can be performed on 3M data, the structure of the server was kept intentionally simple in order to preserve its genericity, relying on standard Web technologies. Layers of annotations, defined as data associated to a media fragment from the corpus, are stored in a database and can be managed through standard interfaces with authentication. Interfaces tailored specifically to the needed task can then be developed in an agile way, relying on simple but reliable services for the management of the centralized annotations. We then present our implementation of an active learning scenario for person annotation in video, relying on the CAMOMILE server; during a dry run experiment, the manual annotation of 716 speech segments was thus propagated to 3504 labeled tracks. The code of the CAMOMILE framework is distributed in open source.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Low-latency speaker spotting with online diarization and detection

Author: Barras Claude
Bredin Herve
Delgado Hector
Evans Nicholas
Komaty Alain
Marcel Sébastien
Patino Jose
Wisniewski Guillaume
Yin Ruiqing
Publication venue: 'International Speech Communication Association'
Publication date: 18/03/2020
Field of study

This paper introduces a new task termed low-latency speaker spotting (LLSS). Related to security and intelligence applications, the task involves the detection, as soon as possible, of known speakers within multi-speaker audio streams. The paper describes differences to the established fields of speaker diarization and automatic speaker verification and proposes a new protocol and metrics to support exploration of LLSS. These can be used together with an existing, publicly available database to assess the performance of LLSS solutions also proposed in the paper. They combine online diarization and speaker detection systems. Diarization systems include a naive, over-segmentation approach and fully-fledged online diarization using segmental i-vectors. Speaker detection is performed using Gaussian mixture models, i-vectors or neural speaker embeddings. Metrics reflect different approaches to characterise latency in addition to detection performance. The relative performance of each solution is dependent on latency. When higher latency is admissible, i-vector solutions perform well; embeddings excel when latency must be kept to a minimum. With a need to improve the reliability of online diarization and detection, the proposed LLSS framework provides a vehicle to fuel future research in both areas. In this respect, we embrace a reproducible research policy; results can be readily reproduced using publicly available resources and open source codes

Infoscience - École polytechnique fédérale de Lausanne

The CAMOMILE collaborative annotation platform for multi-modal, multi-lingual and multi-media documents

Author: Adda Gilles
Barras Claude
Bredin Herve
Budnik Mateusz
Hernando Pericás Francisco Javier
Mariani Joseph
Morros Rubió Josep Ramon
Poignant Johann
Publication venue: European Language Resources Association
Publication date
Field of study

RECERCAT

The Speed Submission to DIHARD II: Contributions & Lessons Learned

Author: Barras Claude
Bredin Herve
Brutti Alessio
Cornell Samuele
Evans Nicholas
Korshunov Pavel
Marcel Sébastien
Patino Jose
Sahidullah Md
Serizel Romain
Sivasankaran Sunit
Squartini Stefano
Vincent Emmanuel
Yin Ruiqing
Publication venue: Idiap
Publication date: 18/02/2020
Field of study

This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team. Besides describing the system, which considerably outperformed the challenge baselines, we also focus on the lessons learned from numerous approaches that we tried for single and multi-channel systems. We present several components of our diarization system, including categorization of domains, speech enhancement, speech activity detection, speaker embeddings, clustering methods, resegmentation, and system fusion. We analyze and discuss the effect of each such component on the overall diarization performance within the realistic settings of the challenge

Infoscience - École polytechnique fédérale de Lausanne

Refractive indices of semiconductors from energy gaps

Author: a Groneberg
Ahmad
Ahmad
Al-Douri
Al-Douri
Anani
Bahadur
Bodnar
Bodnar
Borak
Bredin
Duffy
Duffy
Ghosh
Gopal
Gupta
Haq
Hasse
Herve
Herve
Johnson
Khan
Khenata
Kumar
Lahewil
Lefebvre
Levine
Lippens
McCann
Moss
Moss
Moss
Murtaza
Ouahrani
Paskov
Penn
Rappl
Ravindra
Ravindra
Ravindra
Reddy
Reddy
Reddy
Reddy
Reddy
Reshak
Reshak
Reshak
Reshak
Reshak
Reshak
Reshak
Reshak
Reynolds
S.K. Tripathy
Tamargo
Tit
Tit
Trojnar
Umar
Wong
Wong
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Optical and electronic properties of some semiconductors from energy gaps

Author: A Groneberg
Adachi
Ahmad
Ahmad
Al-Douri
Al-Douri
Al-Douri
Aly
Anani
Anup Pattanaik
Bahadur
Borak
Bredin
Cartlow
Christensen
Duffy
Duffy
Garcia
Gopal
Gupta
Haq
Hasse
Herve
Herve
Hinchlife
Jackson
Johnson
Khenata
Kumar
Kumar
Kumar
Kumar Srivastav
Lahewil
Lefebvre
Levine
Lippens
Moss
Moss
Moss
Paskov
Pauling
Pauling
Penn
Phillips
Phillips
Phillips
Rappl
Ravindra
Ravindra
Reddy
Reddy
Reddy
Reddy
Reddy
Reshak
Reshak
Reshak
Reshak
Reshak
Reshak
Reynolds
Sharma
Sunil K. Tripathy
Tamargo
Tit
Tripathy
Trojnar
Tubb
Umar
Wong
Wong
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Towards large scale multimedia indexing: a case study on person discovery in broadcast news

Author: Barbosa da Fonseca Gabriel
Barras Claude
Bredin Herve
Docio-Fernández Laura
García-Mateo Carmen
Gravier Guillaume
Guinaudeau Camille
Hernando Pericás Francisco Javier
India Massana Miquel Àngel
Jamil F. Guimarães Silvio
Le Nam
Lyon Freire Izabela
López-Otero Paula
Martí Juan Gerard
Meignier Sylvain
Morros Rubió Josep Ramon
Odobez Jean-Marc
Patrocinio Jr. Zenilton
Sergent Gabriel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

The rapid growth of multimedia databases and the human interest in their peers make indices representing the location and identity of people in audio-visual documents essential for searching archives. Person discovery in the absence of prior identity knowledge requires accurate association of audio-visual cues and detected names. To this end, we present 3 different strategies to approach this problem: clustering-based naming, verification-based naming, and graph-based naming. Each of these strategies utilizes different recent advances in unsupervised face / speech representation, verification, and optimization. To have a better understanding of the approaches, this paper also provides a quantitative and qualitative comparative study of these approaches using the associated corpus of the Person Discovery challenge at MediaEval 2016. From the results of our experiments, we can observe the pros and cons of each approach, thus paving the way for future promising research directions.Peer Reviewe

RECERCAT