195 research outputs found

    Source and Filter Estimation for Throat-Microphone Speech Enhancement

    Get PDF
    In this paper, we propose a new statistical enhancement system for throat microphone recordings through source and filter separation. Throat microphones (TM) are skin-attached piezoelectric sensors that can capture speech sound signals in the form of tissue vibrations. Due to their limited bandwidth, TM recorded speech suffers from intelligibility and naturalness. In this paper, we investigate learning phone-dependent Gaussian mixture model (GMM)-based statistical mappings using parallel recordings of acoustic microphone (AM) and TM for enhancement of the spectral envelope and excitation signals of the TM speech. The proposed mappings address the phone-dependent variability of tissue conduction with TM recordings. While the spectral envelope mapping estimates the line spectral frequency (LSF) representation of AM from TM recordings, the excitation mapping is constructed based on the spectral energy difference (SED) of AM and TM excitation signals. The excitation enhancement is modeled as an estimation of the SED features from the TM signal. The proposed enhancement system is evaluated using both objective and subjective tests. Objective evaluations are performed with the log-spectral distortion (LSD), the wideband perceptual evaluation of speech quality (PESQ) and mean-squared error (MSE) metrics. Subjective evaluations are performed with an A/B comparison test. Experimental results indicate that the proposed phone-dependent mappings exhibit enhancements over phone-independent mappings. Furthermore enhancement of the TM excitation through statistical mappings of the SED features introduces significant objective and subjective performance improvements to the enhancement of TM recordings. ©2015 IEEE

    EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

    Get PDF
    The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Speaker recognition: current state and experiment

    Get PDF
    [ANGLÈS] In this thesis the operation of the speaker recognition systems is described and the state of the art of the main working blocks is studied. All the research papers looked through can be found in the References. As voice is unique to the individual, it has emerged as a viable authentication method. There are several problems that should be considered as the presence of noise in the environment and changes in the voice of the speakers due to sickness for example. These systems combine knowledge from signal processing for the feature extraction part and signal modeling for the classification and decision part. There are several techniques for the feature extraction and the pattern matching blocks, so it is quite tricky to establish a unique and optimum solution. MFCC and DTW are the most common techniques for each block, respectively. They are discussed in this document, with a special emphasis on their drawbacks, that motivate new techniques which are also presented here. A search through the Internet is done in order to find commercial working implementations, which are quite rare, then a basic introduction to Praat is presented. Finally, some intra-speaker and inter-speaker tests are done using this software.[CASTELLÀ] En esta tesis, el funcionamento de los sistemas de reconocimiento del hablante es descrito y el estado del arte de los principales bloques de funcionamento es estudiado. Todos los documentos de investigación consultados se encuentran en las referencias. Dado que la voz es única al individuo, se ha vuelto un método viable de identificación. Hay varios problemas que han de ser considerados, como la presencia de ruido en el ambiente y los cambios en la voz de los hablantes, por ejemplo debido a enfermedades. Estos sistemas combinan conocimiento de procesado de señal en la parte de extracción de características de la voz y modelaje de señal en la parte de clasificación y decisión. Hay diferentes técnicas para la extracción de las características, y para el tratamiento de la similitud entre patrones, de tal manera que es complicado establecer una única y óptima solución. MFCC y DTW son las técnicas más comunes para cada bloque, respectivamente. Son tratadas en este documento, haciendo énfasis en sus problemas, que motivan nuevas técnicas, que también son presentadas aquí. Se realiza una búsqueda por Internet, para encontrar productos comerciales implementados, que son pocos, posteriormente se hace una introducción al software Praat. Finalmente, se realizan algunos intra-speaker i inter-speaker tests usando este programa.[CATALÀ] En aquesta tesi, el funcionament dels sistemes de reconeixement del parlant és descrit i l'estat de l'art dels principals blocs de funcionament és estudiat. Tots els documents de recerca consultats es troben a les referències. Donat que la veu és única a l'individu, ha esdevingut un mètode viable d'identificació. Hi ha diversos problemes que han de ser considerats, com ara la presència de soroll en l'ambient i els canvis en la veu dels parlants, per exemple deguts a malalties. Aquests sistemes combinen coneixement de processament de senyal en la part d'extracció de característiques de la veu i modelatge de senyal en la part de classificació i decisió. Hi ha diferents tècniques per a l'extracció de les característiques, i per al tractament de la similitud entre patrons, de tal manera que és complicat establir una única i òptima solució. MFCC i DTW són les tècniques més comunes per a cada bloc, respectivament. Són tractades en aquest document, fent èmfasi en els seus problemes, que motiven noves tècniques, que també són presentades aquí. Es realitza una cerca per Internet, per tal de trobar productes comercials implementats, que són pocs, posteriorment es fa una introducció al software Praat. Finalment, es realitzen alguns intra-speaker i inter-speaker tests fent servir aquest programa

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Multimodal Wearable Sensors for Human-Machine Interfaces

    Get PDF
    Certain areas of the body, such as the hands, eyes and organs of speech production, provide high-bandwidth information channels from the conscious mind to the outside world. The objective of this research was to develop an innovative wearable sensor device that records signals from these areas more conveniently than has previously been possible, so that they can be harnessed for communication. A novel bioelectrical and biomechanical sensing device, the wearable endogenous biosignal sensor (WEBS), was developed and tested in various communication and clinical measurement applications. One ground-breaking feature of the WEBS system is that it digitises biopotentials almost at the point of measurement. Its electrode connects directly to a high-resolution analog-to-digital converter. A second major advance is that, unlike previous active biopotential electrodes, the WEBS electrode connects to a shared data bus, allowing a large or small number of them to work together with relatively few physical interconnections. Another unique feature is its ability to switch dynamically between recording and signal source modes. An accelerometer within the device captures real-time information about its physical movement, not only facilitating the measurement of biomechanical signals of interest, but also allowing motion artefacts in the bioelectrical signal to be detected. Each of these innovative features has potentially far-reaching implications in biopotential measurement, both in clinical recording and in other applications. Weighing under 0.45 g and being remarkably low-cost, the WEBS is ideally suited for integration into disposable electrodes. Several such devices can be combined to form an inexpensive digital body sensor network, with shorter set-up time than conventional equipment, more flexible topology, and fewer physical interconnections. One phase of this study evaluated areas of the body as communication channels. The throat was selected for detailed study since it yields a range of voluntarily controllable signals, including laryngeal vibrations and gross movements associated with vocal tract articulation. A WEBS device recorded these signals and several novel methods of human-to-machine communication were demonstrated. To evaluate the performance of the WEBS system, recordings were validated against a high-end biopotential recording system for a number of biopotential signal types. To demonstrate an application for use by a clinician, the WEBS system was used to record 12‑lead electrocardiogram with augmented mechanical movement information
    • …
    corecore