172 research outputs found

    New time-frequency derived cepstral coefficients for automatic speech recognition

    Get PDF
    The goal is to improve recognition rate by optimisation of Mel Frequency Cepstral Coefficients (MFCCs): modifications concern the time-frequency representation used to estimate these coefficients. There are many ways to obtain a spectrum out of a signal which differ in the method itself (Fourier, Wavelets,...), and in the normalisation. We show here that we can obtain noise resistant cepstral coefficients, for speaker independent connected word recognition.The recognition system is based on a continuous whole word hidden Markov model. An error reduction rate of approximately 50\% is achieved. Moreover evaluation tests demonstrate that these results can be obtained with smaller databases: halving the training database have small effects on recognition rates (which is not the case with traditional MFCCs)

    Quantification de séquences spectrales de longueurs variables pour le codage de la parole à très bas débit

    Get PDF
    Ce papier traite du codage des paramètres spectraux pour le codage de parole à très bas débit. Nous présentons une nouvelle interprétation de recherches précédemment publiées par Chou-Lockabaugh et Cemocky-Baudoin-Chollet sur la quantification de séquences spectrales de longueurs variables, sous les noms respectifs de « Variable to Variable length Vector Quantization » (VVVQ) et de quantification par multigrammes (MGQ). Nous avons, d'autre part étudié l'influence de la limitation du retard introduit par la méthode et proposé une technique pour optimiser les performances en présence d'un retard maximum imposé. Nous avons ainsi trouvé qu'un retard de 400 ms est généralement suffisant. Enfin, nous proposons l'introduction de longues séquences dans le dictionnaire par interpolation linéaire des séquences courtes

    Privacy Preserving Personal Assistant with On-Device Diarization and Spoken Dialogue System for Home and Beyond

    Full text link
    In the age of personal voice assistants, the question of privacy arises. These digital companions often lack memory of past interactions, while relying heavily on the internet for speech processing, raising privacy concerns. Modern smartphones now enable on-device speech processing, making cloud-based solutions unnecessary. Personal assistants for the elderly should excel at memory recall, especially in medical examinations. The e-ViTA project developed a versatile conversational application with local processing and speaker recognition. This paper highlights the importance of speaker diarization enriched with sensor data fusion for contextualized conversation preservation. The use cases applied to the e-VITA project have shown that truly personalized dialogue is pivotal for individual voice assistants. Secure local processing and sensor data fusion ensure virtual companions meet individual user needs without compromising privacy or data security.Comment: 10 pages, 1 figure, to be presented at https://ihiet-ai.org/, Lausanne in April 202

    Home monitoring for frailty detection through sound and speaker diarization analysis

    Full text link
    As the French, European and worldwide populations are aging, there is a strong interest for new systems that guarantee a reliable and privacy preserving home monitoring for frailty prevention. This work is a part of a global environmental audio analysis system which aims to help identification of Activities of Daily Life (ADL) through human and everyday life sounds recognition, speech presence and number of speakers detection. The focus is made on the number of speakers detection. In this article, we present how recent advances in sound processing and speaker diarization can improve the existing embedded systems. We study the performances of two new methods and discuss the benefits of DNN based approaches which improve performances by about 100%.Comment: JETSAN, Jun 2023, Aubervilliers & Paris, Franc

    Combining methods to improve speaker verification decision

    Get PDF
    The aim of this paper is to describe how the combination of speaker verification algorithms with a priori decision thresholds can improve the overall robustness of a real application. The evaluation is performed in the context of a field application where each client is verified from a 7 digit pin code. This paper demonstrate that it is possible to increase the global performances of the system on combining the result of several algorithms

    Combining methods to improve speaker verification decision

    Get PDF
    The aim of this paper is to describe how the combination of speaker verification algorithms with a priori decision thresholds can improve the overall robustness of a real application. The evaluation is performed in the context of a field application where each client is verified from a 7 digit pin code. This paper demonstrate that it is possible to increase the global performances of the system on combining the result of several algorithms

    Towards a Practical Silent Speech Interface Based on Vocal Tract Imaging

    Get PDF
    Intégralité des actes de cette conférence disponible au lien suivant: http://www.issp2011.uqam.ca/upload/files/proceedings.pdfInternational audienceThe paper describes advances in the development of an ultrasound silent speech interface for use in silent communications applications or as a speaking aid for persons who have undergone a laryngectomy. It reports some first steps towards making such a device lightweight, portable, interactive, and practical to use. Simple experimental tests of an interactive silent speech interface for everyday applications are described. Possible future improvements including extension to continuous speech and real time operation are discussed.Cet article décrit les avancements dans le développement d'une interface ultrasonore de parole silencieuse, pour des applications en communication silencieuse ou comme une aide pour les personnes laryngectomisées. Nous rapportons les premiers pas pour réaliser une telle interface portable, interactive, et pratique à utiliser. De simples tests expérimentaux de cette interface pour des applications quotidiennes sont décrits. Des améliorations futures possibles incluant l'extension à la parole continue et aux traitements en temps réels sont discutées

    Secured vocal access to telephone servers

    Get PDF
    A number of applications of man-machine interaction over the telephone requires a combination of speech recognition and speaker verification. This paper describes current work carried out at IDIAP in the framework of national and European projects. A generic Interactive Voice Server (IVS) is described by means of a graphical formalism. It includes speech recognition based on speaker independent flexible vocabulary technology and speaker verification performed by a number of techniques executed in parallel, and combined for optimal decision

    Swiss French PolyPhone and PolyVar: telephone speech databases to model inter- and intra-speaker variability

    Get PDF
    Following the demand of the speech technology market, a number of companies and research laboratories joined their forces in order to produce valuable and reusable resources, especially speech databases. Serving their purpose, the collected databases are used for developing, testing, enhancing and evaluating speech technology products, like interactive voice servers, listening typewriter, speaker verification and identification systems, etc. Especially for capturing intra-speaker variability, the PolyVar database was designed and recorded at IDIAP, as a complement to the Swiss French PolyPhone database, which adresses inter-speaker variability issues. We will detail in the following the specific problems of speech database collection (sampling the speaker population, selection of vocabulary items, ...), and will present actual development we carried out at IDIAP throught the PolyPhone and PolyVar databases
    • …
    corecore