172 research outputs found
New time-frequency derived cepstral coefficients for automatic speech recognition
The goal is to improve recognition rate by optimisation of Mel Frequency Cepstral Coefficients (MFCCs): modifications concern the time-frequency representation used to estimate these coefficients. There are many ways to obtain a spectrum out of a signal which differ in the method itself (Fourier, Wavelets,...), and in the normalisation. We show here that we can obtain noise resistant cepstral coefficients, for speaker independent connected word recognition.The recognition system is based on a continuous whole word hidden Markov model. An error reduction rate of approximately 50\% is achieved. Moreover evaluation tests demonstrate that these results can be obtained with smaller databases: halving the training database have small effects on recognition rates (which is not the case with traditional MFCCs)
Quantification de séquences spectrales de longueurs variables pour le codage de la parole à très bas débit
Ce papier traite du codage des paramètres spectraux pour le codage de parole à très bas débit. Nous présentons une nouvelle interprétation de recherches précédemment publiées par Chou-Lockabaugh et Cemocky-Baudoin-Chollet sur la quantification de séquences spectrales de longueurs variables, sous les noms respectifs de « Variable to Variable length Vector Quantization » (VVVQ) et de quantification par multigrammes (MGQ). Nous avons, d'autre part étudié l'influence de la limitation du retard introduit par la méthode et proposé une technique pour optimiser les performances en présence d'un retard maximum imposé. Nous avons ainsi trouvé qu'un retard de 400 ms est généralement suffisant. Enfin, nous proposons l'introduction de longues séquences dans le dictionnaire par interpolation linéaire des séquences courtes
Privacy Preserving Personal Assistant with On-Device Diarization and Spoken Dialogue System for Home and Beyond
In the age of personal voice assistants, the question of privacy arises.
These digital companions often lack memory of past interactions, while relying
heavily on the internet for speech processing, raising privacy concerns. Modern
smartphones now enable on-device speech processing, making cloud-based
solutions unnecessary. Personal assistants for the elderly should excel at
memory recall, especially in medical examinations. The e-ViTA project developed
a versatile conversational application with local processing and speaker
recognition. This paper highlights the importance of speaker diarization
enriched with sensor data fusion for contextualized conversation preservation.
The use cases applied to the e-VITA project have shown that truly personalized
dialogue is pivotal for individual voice assistants. Secure local processing
and sensor data fusion ensure virtual companions meet individual user needs
without compromising privacy or data security.Comment: 10 pages, 1 figure, to be presented at https://ihiet-ai.org/,
Lausanne in April 202
Home monitoring for frailty detection through sound and speaker diarization analysis
As the French, European and worldwide populations are aging, there is a
strong interest for new systems that guarantee a reliable and privacy
preserving home monitoring for frailty prevention. This work is a part of a
global environmental audio analysis system which aims to help identification of
Activities of Daily Life (ADL) through human and everyday life sounds
recognition, speech presence and number of speakers detection. The focus is
made on the number of speakers detection. In this article, we present how
recent advances in sound processing and speaker diarization can improve the
existing embedded systems. We study the performances of two new methods and
discuss the benefits of DNN based approaches which improve performances by
about 100%.Comment: JETSAN, Jun 2023, Aubervilliers & Paris, Franc
Combining methods to improve speaker verification decision
The aim of this paper is to describe how the combination of speaker verification algorithms with a priori decision thresholds can improve the overall robustness of a real application. The evaluation is performed in the context of a field application where each client is verified from a 7 digit pin code. This paper demonstrate that it is possible to increase the global performances of the system on combining the result of several algorithms
Combining methods to improve speaker verification decision
The aim of this paper is to describe how the combination of speaker verification algorithms with a priori decision thresholds can improve the overall robustness of a real application. The evaluation is performed in the context of a field application where each client is verified from a 7 digit pin code. This paper demonstrate that it is possible to increase the global performances of the system on combining the result of several algorithms
Towards a Practical Silent Speech Interface Based on Vocal Tract Imaging
Intégralité des actes de cette conférence disponible au lien suivant: http://www.issp2011.uqam.ca/upload/files/proceedings.pdfInternational audienceThe paper describes advances in the development of an ultrasound silent speech interface for use in silent communications applications or as a speaking aid for persons who have undergone a laryngectomy. It reports some first steps towards making such a device lightweight, portable, interactive, and practical to use. Simple experimental tests of an interactive silent speech interface for everyday applications are described. Possible future improvements including extension to continuous speech and real time operation are discussed.Cet article décrit les avancements dans le développement d'une interface ultrasonore de parole silencieuse, pour des applications en communication silencieuse ou comme une aide pour les personnes laryngectomisées. Nous rapportons les premiers pas pour réaliser une telle interface portable, interactive, et pratique à utiliser. De simples tests expérimentaux de cette interface pour des applications quotidiennes sont décrits. Des améliorations futures possibles incluant l'extension à la parole continue et aux traitements en temps réels sont discutées
Secured vocal access to telephone servers
A number of applications of man-machine interaction over the telephone requires a combination of speech recognition and speaker verification. This paper describes current work carried out at IDIAP in the framework of national and European projects. A generic Interactive Voice Server (IVS) is described by means of a graphical formalism. It includes speech recognition based on speaker independent flexible vocabulary technology and speaker verification performed by a number of techniques executed in parallel, and combined for optimal decision
Swiss French PolyPhone and PolyVar: telephone speech databases to model inter- and intra-speaker variability
Following the demand of the speech technology market, a number of companies and research laboratories joined their forces in order to produce valuable and reusable resources, especially speech databases. Serving their purpose, the collected databases are used for developing, testing, enhancing and evaluating speech technology products, like interactive voice servers, listening typewriter, speaker verification and identification systems, etc. Especially for capturing intra-speaker variability, the PolyVar database was designed and recorded at IDIAP, as a complement to the Swiss French PolyPhone database, which adresses inter-speaker variability issues. We will detail in the following the specific problems of speech database collection (sampling the speaker population, selection of vocabulary items, ...), and will present actual development we carried out at IDIAP throught the PolyPhone and PolyVar databases
- …