Search CORE

172 research outputs found

New time-frequency derived cepstral coefficients for automatic speech recognition

Author: Chollet Gérard
Wassner Hubert
Publication venue
Publication date: 10/03/2006
Field of study

The goal is to improve recognition rate by optimisation of Mel Frequency Cepstral Coefficients (MFCCs): modifications concern the time-frequency representation used to estimate these coefficients. There are many ways to obtain a spectrum out of a signal which differ in the method itself (Fourier, Wavelets,...), and in the normalisation. We show here that we can obtain noise resistant cepstral coefficients, for speaker independent connected word recognition.The recognition system is based on a continuous whole word hidden Markov model. An error reduction rate of approximately 50\% is achieved. Moreover evaluation tests demonstrate that these results can be obtained with smaller databases: halving the training database have small effects on recognition rates (which is not the case with traditional MFCCs)

Infoscience - École polytechnique fédérale de Lausanne

Quantification de séquences spectrales de longueurs variables pour le codage de la parole à très bas débit

Author: BAUDOIN Geneviève
CERNOCKY Jan
CHOLLET Gérard
Publication venue: GRETSI, Groupe d’Etudes du Traitement du Signal et des Images
Publication date: 01/01/1997
Field of study

Ce papier traite du codage des paramètres spectraux pour le codage de parole à très bas débit. Nous présentons une nouvelle interprétation de recherches précédemment publiées par Chou-Lockabaugh et Cemocky-Baudoin-Chollet sur la quantification de séquences spectrales de longueurs variables, sous les noms respectifs de « Variable to Variable length Vector Quantization » (VVVQ) et de quantification par multigrammes (MGQ). Nous avons, d'autre part étudié l'influence de la limitation du retard introduit par la méthode et proposé une technique pour optimiser les performances en présence d'un retard maximum imposé. Nous avons ainsi trouvé qu'un retard de 400 ms est généralement suffisant. Enfin, nous proposons l'introduction de longues séquences dans le dictionnaire par interpolation linéaire des séquences courtes

I-Revues

Privacy Preserving Personal Assistant with On-Device Diarization and Spoken Dialogue System for Home and Beyond

Author: Boudy Jérôme
Chollet Gérard
Hariz Mossaab
Lohr Christophe
Sansen Hugues
Tevissen Yannis
Yassa Fathy
Publication venue
Publication date: 02/01/2024
Field of study

In the age of personal voice assistants, the question of privacy arises. These digital companions often lack memory of past interactions, while relying heavily on the internet for speech processing, raising privacy concerns. Modern smartphones now enable on-device speech processing, making cloud-based solutions unnecessary. Personal assistants for the elderly should excel at memory recall, especially in medical examinations. The e-ViTA project developed a versatile conversational application with local processing and speaker recognition. This paper highlights the importance of speaker diarization enriched with sensor data fusion for contextualized conversation preservation. The use cases applied to the e-VITA project have shown that truly personalized dialogue is pivotal for individual voice assistants. Secure local processing and sensor data fusion ensure virtual companions meet individual user needs without compromising privacy or data security.Comment: 10 pages, 1 figure, to be presented at https://ihiet-ai.org/, Lausanne in April 202

arXiv.org e-Print Archive

Home monitoring for frailty detection through sound and speaker diarization analysis

Author: Boudy Jérôme
Boutamine Sami
Chollet Gérard
Istrate Dan
Petitpont Frédéric
Tevissen Yannis
Zalc Vincent
Publication venue
Publication date: 17/08/2023
Field of study

As the French, European and worldwide populations are aging, there is a strong interest for new systems that guarantee a reliable and privacy preserving home monitoring for frailty prevention. This work is a part of a global environmental audio analysis system which aims to help identification of Activities of Daily Life (ADL) through human and everyday life sounds recognition, speech presence and number of speakers detection. The focus is made on the number of speakers detection. In this article, we present how recent advances in sound processing and speaker diarization can improve the existing embedded systems. We study the performances of two new methods and discuss the benefits of DNN based approaches which improve performances by about 100%.Comment: JETSAN, Jun 2023, Aubervilliers & Paris, Franc

arXiv.org e-Print Archive

Combining methods to improve speaker verification decision

Author: Bimbot Frédéric
Chollet Gérard
Genoud Dominique
Gravier Guillaume
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

The aim of this paper is to describe how the combination of speaker verification algorithms with a priori decision thresholds can improve the overall robustness of a real application. The evaluation is performed in the context of a field application where each client is verified from a 7 digit pin code. This paper demonstrate that it is possible to increase the global performances of the system on combining the result of several algorithms

Infoscience - École polytechnique fédérale de Lausanne

Combining methods to improve speaker verification decision

Author: Bimbot Frédéric
Chollet Gérard
Genoud Dominique
Gravier Guillaume
Publication venue: 'International Speech Communication Association'
Publication date: 10/03/2006
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Towards a Practical Silent Speech Interface Based on Vocal Tract Imaging

Author: Cai Jun
Chollet Gérard
Crevier-Buchman Lise
Denby Bruce
Dreyfus Gérard
Hueber Thomas
Manitsaris Sotiris
Pillot-Loiseau Claire
Roussel Pierre
Stone Maureen
Publication venue: HAL CCSD
Publication date: 20/07/2011
Field of study

Intégralité des actes de cette conférence disponible au lien suivant: http://www.issp2011.uqam.ca/upload/files/proceedings.pdfInternational audienceThe paper describes advances in the development of an ultrasound silent speech interface for use in silent communications applications or as a speaking aid for persons who have undergone a laryngectomy. It reports some first steps towards making such a device lightweight, portable, interactive, and practical to use. Simple experimental tests of an interactive silent speech interface for everyday applications are described. Possible future improvements including extension to continuous speech and real time operation are discussed.Cet article décrit les avancements dans le développement d'une interface ultrasonore de parole silencieuse, pour des applications en communication silencieuse ou comme une aide pour les personnes laryngectomisées. Nous rapportons les premiers pas pour réaliser une telle interface portable, interactive, et pratique à utiliser. De simples tests expérimentaux de cette interface pour des applications quotidiennes sont décrits. Des améliorations futures possibles incluant l'extension à la parole continue et aux traitements en temps réels sont discutées

Hal - Université Grenoble Alpes

Secured vocal access to telephone servers

Author: Bornet Olivier
Chollet Gérard
Cochard Jean-Luc
Constantinescu Andrei
Genoud Dominique
Publication venue: IDIAP / CNRS
Publication date: 10/03/2006
Field of study

A number of applications of man-machine interaction over the telephone requires a combination of speech recognition and speaker verification. This paper describes current work carried out at IDIAP in the framework of national and European projects. A generic Interactive Voice Server (IVS) is described by means of a graphical formalism. It includes speech recognition based on speaker independent flexible vocabulary technology and speaker verification performed by a number of techniques executed in parallel, and combined for optimal decision

Infoscience - École polytechnique fédérale de Lausanne

Swiss French PolyPhone and PolyVar: telephone speech databases to model inter- and intra-speaker variability

Author: Chollet Gérard
Cochard Jean-Luc
Constantinescu Andrei
Jaboulet Cédric
Langlais Philippe
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Following the demand of the speech technology market, a number of companies and research laboratories joined their forces in order to produce valuable and reusable resources, especially speech databases. Serving their purpose, the collected databases are used for developing, testing, enhancing and evaluating speech technology products, like interactive voice servers, listening typewriter, speaker verification and identification systems, etc. Especially for capturing intra-speaker variability, the PolyVar database was designed and recorded at IDIAP, as a complement to the Swiss French PolyPhone database, which adresses inter-speaker variability issues. We will detail in the following the specific problems of speech database collection (sampling the speaker population, selection of vocabulary items, ...), and will present actual development we carried out at IDIAP throught the PolyPhone and PolyVar databases

Infoscience - École polytechnique fédérale de Lausanne