419 research outputs found

    Transfer Learning for Speech and Language Processing

    Full text link
    Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of `model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the `transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field.Comment: 13 pages, APSIPA 201

    Robust speaker identification using artificial neural networks

    Full text link
    This research mainly focuses on recognizing the speakers through their speech samples. Numerous Text-Dependent or Text-Independent algorithms have been developed by people so far, to recognize the speaker from his/her speech. In this thesis, we concentrate on the recognition of the speaker from the fixed text i.e. Text-Dependent . Possibility of extending this method to variable text i.e. Text-Independent is also analyzed. Different feature extraction algorithms are employed and their performance with Artificial Neural Networks as a Data Classifier on a fixed training set is analyzed. We find a way to combine all these individual feature extraction algorithms by incorporating their interdependence. The efficiency of these algorithms is determined after the input speech is classified using Back Propagation Algorithm of Artificial Neural Networks. A special case of Back Propagation Algorithm which improves the efficiency of the classification is also discussed

    Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

    Get PDF
    This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

    Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

    Full text link
    The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. While data collection allows for the development of efficient tools powering most speech services, it also poses serious privacy issues for users as centralized storage makes private personal speech data vulnerable to cyber threats. With the increasing use of voice-based digital assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the increasing ease with which personal speech data can be collected, the risk of malicious use of voice-cloning and speaker/gender/pathological/etc. recognition has increased. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization. In this work, anonymization refers to making personal speech data unlinkable to an identity while maintaining the usefulness (utility) of the speech signal (e.g., access to linguistic content). We start by identifying several challenges that evaluation protocols need to consider to evaluate the degree of privacy protection properly. We clarify how anonymization systems must be configured for evaluation purposes and highlight that many practical deployment configurations do not permit privacy evaluation. Furthermore, we study and examine the most common voice conversion-based anonymization system and identify its weak points before suggesting new methods to overcome some limitations. We isolate all components of the anonymization system to evaluate the degree of speaker PPI associated with each of them. Then, we propose several transformation methods for each component to reduce as much as possible speaker PPI while maintaining utility. We promote anonymization algorithms based on quantization-based transformation as an alternative to the most-used and well-known noise-based approach. Finally, we endeavor a new attack method to invert anonymization.Comment: PhD Thesis Pierre Champion | Universit\'e de Lorraine - INRIA Nancy | for associated source code, see https://github.com/deep-privacy/SA-toolki

    Analysis, implementation and evaluation of blind separation algorithms from audio sources

    Get PDF
    El propósito de este Trabajo de Fin de Grado (TFG) sería realizar un banco de pruebas con distintos méetodos utilizados para la separacióon ciega de fuentes de audio, Blind Source Separation (BSS), que sería usado en algoritmos de localización acústica para mejorar su precisión. Bajo este propósito, se usaría la herramienta software MATLAB para implementar y evaluar los distintos sistemas propuestos. Consideraremos que la mayoría de los sistemas relacionados en la literatura científi ca se componen de tres fases: pasar al dominio tiempo-frecuencia mediante la realización de la Short Time Fourier Transform (STFT) de la mezcla de audio, separar la misma en las diferentes fuentes aplicando tecnicas Blind Source Separation (BSS), y la nal reconstrucción de las señales obtenidas y paso al dominio del tiempo utilizando la Inverse Short Time Fourier Transform (ISTFT). Con el n de alcanzar los objetivos descritos, se ha dividido el trabajo en las siguientes tareas: diseño de los bloques encargados de realizar la STFT y la ISTFT, que serían comunes para todos los métodos BSS, desarrollo del banco de métodos BSS y de la etapa de ltrado, utilizando un ltro Wiener o similar, que también sería común a todos los métodos. Finalmente, se probaría y evaluaría el sistema completo mediante mezclas de audio obtenidas en entornos similares al que se desea aplicar el sistema para mejorar la localización de las distintas fuentes.The purpose of this bachelor thesis work will be to perform a test bench with di erent methods used for Blind Source Separation (BSS) of people's voices, to be used in acoustic localization algorithms to improve their accuracy. For this purpose, the MATLAB software tool will be used to implement and evaluate the di erent proposed systems. We will consider that most related systems in the scienti c literature have three phases: move the audio source to the time-frequency (TF) domain by perfoming the Short Time Fourier Transform (STFT) to the audio mix, separate the audio into the di erent sources applying BSS techniques, and a nal task with the obtained signals of reconstruction and transition to the time domain using the Inverse Short Time Fourier Transform (ISTFT). For reaching the objectives described, the project has been divided in the following tasks: design of the blocks responsible for carrying out the STFT and the ISTFT that will be common for all BSS methods and the development of the BSS method bank and of the ltering stage, using a Wiener lter or similar, which will also be common to all methods. Last of all, testing and evaluation of the complete system using audio mixes obtained in similar environments to the one in which the system is to be applied to improve the location of the various sources.Grado en Ingeniería en Tecnologías de Telecomunicació

    Articulatory features for conversational speech recognition

    Get PDF
    corecore