419 research outputs found
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
Robust speaker identification using artificial neural networks
This research mainly focuses on recognizing the speakers through their speech samples. Numerous Text-Dependent or Text-Independent algorithms have been developed by people so far, to recognize the speaker from his/her speech. In this thesis, we concentrate on the recognition of the speaker from the fixed text i.e. Text-Dependent . Possibility of extending this method to variable text i.e. Text-Independent is also analyzed. Different feature extraction algorithms are employed and their performance with Artificial Neural Networks as a Data Classifier on a fixed training set is analyzed. We find a way to combine all these individual feature extraction algorithms by incorporating their interdependence. The efficiency of these algorithms is determined after the input speech is classified using Back Propagation Algorithm of Artificial Neural Networks. A special case of Back Propagation Algorithm which improves the efficiency of the classification is also discussed
Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information
This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
The growing use of voice user interfaces has led to a surge in the collection
and storage of speech data. While data collection allows for the development of
efficient tools powering most speech services, it also poses serious privacy
issues for users as centralized storage makes private personal speech data
vulnerable to cyber threats. With the increasing use of voice-based digital
assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the
increasing ease with which personal speech data can be collected, the risk of
malicious use of voice-cloning and speaker/gender/pathological/etc. recognition
has increased.
This thesis proposes solutions for anonymizing speech and evaluating the
degree of the anonymization. In this work, anonymization refers to making
personal speech data unlinkable to an identity while maintaining the usefulness
(utility) of the speech signal (e.g., access to linguistic content). We start
by identifying several challenges that evaluation protocols need to consider to
evaluate the degree of privacy protection properly. We clarify how
anonymization systems must be configured for evaluation purposes and highlight
that many practical deployment configurations do not permit privacy evaluation.
Furthermore, we study and examine the most common voice conversion-based
anonymization system and identify its weak points before suggesting new methods
to overcome some limitations. We isolate all components of the anonymization
system to evaluate the degree of speaker PPI associated with each of them.
Then, we propose several transformation methods for each component to reduce as
much as possible speaker PPI while maintaining utility. We promote
anonymization algorithms based on quantization-based transformation as an
alternative to the most-used and well-known noise-based approach. Finally, we
endeavor a new attack method to invert anonymization.Comment: PhD Thesis Pierre Champion | Universit\'e de Lorraine - INRIA Nancy |
for associated source code, see https://github.com/deep-privacy/SA-toolki
Analysis, implementation and evaluation of blind separation algorithms from audio sources
El propósito de este Trabajo de Fin de Grado (TFG) sería realizar un banco de pruebas con distintos
méetodos utilizados para la separacióon ciega de fuentes de audio, Blind Source Separation (BSS), que sería
usado en algoritmos de localización acústica para mejorar su precisión.
Bajo este propósito, se usaría la herramienta software MATLAB para implementar y evaluar los
distintos sistemas propuestos. Consideraremos que la mayoría de los sistemas relacionados en la literatura
científi ca se componen de tres fases: pasar al dominio tiempo-frecuencia mediante la realización de la
Short Time Fourier Transform (STFT) de la mezcla de audio, separar la misma en las diferentes fuentes
aplicando tecnicas Blind Source Separation (BSS), y la nal reconstrucción de las señales obtenidas y
paso al dominio del tiempo utilizando la Inverse Short Time Fourier Transform (ISTFT).
Con el n de alcanzar los objetivos descritos, se ha dividido el trabajo en las siguientes tareas: diseño
de los bloques encargados de realizar la STFT y la ISTFT, que serían comunes para todos los métodos
BSS, desarrollo del banco de métodos BSS y de la etapa de ltrado, utilizando un ltro Wiener o similar,
que también sería común a todos los métodos. Finalmente, se probaría y evaluaría el sistema completo
mediante mezclas de audio obtenidas en entornos similares al que se desea aplicar el sistema para mejorar
la localización de las distintas fuentes.The purpose of this bachelor thesis work will be to perform a test bench with di erent methods used
for Blind Source Separation (BSS) of people's voices, to be used in acoustic localization algorithms to
improve their accuracy.
For this purpose, the MATLAB software tool will be used to implement and evaluate the di erent
proposed systems. We will consider that most related systems in the scienti c literature have three phases:
move the audio source to the time-frequency (TF) domain by perfoming the Short Time Fourier Transform
(STFT) to the audio mix, separate the audio into the di erent sources applying BSS techniques, and a
nal task with the obtained signals of reconstruction and transition to the time domain using the Inverse
Short Time Fourier Transform (ISTFT).
For reaching the objectives described, the project has been divided in the following tasks: design
of the blocks responsible for carrying out the STFT and the ISTFT that will be common for all BSS
methods and the development of the BSS method bank and of the ltering stage, using a Wiener lter
or similar, which will also be common to all methods. Last of all, testing and evaluation of the complete
system using audio mixes obtained in similar environments to the one in which the system is to be applied
to improve the location of the various sources.Grado en Ingeniería en Tecnologías de Telecomunicació
- …