Search CORE

419 research outputs found

Transfer Learning for Speech and Language Processing

Author: Wang Dong
Zheng Thomas Fang
Publication venue
Publication date: 19/11/2015
Field of study

Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of `model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the `transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field.Comment: 13 pages, APSIPA 201

arXiv.org e-Print Archive

Crossref

Robust speaker identification using artificial neural networks

Author: Sivathanu Pillai Madhavan
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2006
Field of study

This research mainly focuses on recognizing the speakers through their speech samples. Numerous Text-Dependent or Text-Independent algorithms have been developed by people so far, to recognize the speaker from his/her speech. In this thesis, we concentrate on the recognition of the speaker from the fixed text i.e. Text-Dependent . Possibility of extending this method to variable text i.e. Text-Independent is also analyzed. Different feature extraction algorithms are employed and their performance with Artificial Neural Networks as a Data Classifier on a fixed training set is analyzed. We find a way to combine all these individual feature extraction algorithms by incorporating their interdependence. The efficiency of these algorithms is determined after the input speech is classified using Back Propagation Algorithm of Artificial Neural Networks. A special case of Back Propagation Algorithm which improves the efficiency of the classification is also discussed

University of Nevada, Las Vegas Repository

Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

Author: Vu Ngoc Thang
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

KITopen

Reassigned spectrum-based feature extraction for GMM-based automatic chord recognition

Author: A Sheh
C Harte
C Harte
C Joder
C Zieger
DA Reynolds
DA Reynolds
DJ Nelson
DPW Ellis
E Gómez
E Gómez
F Auger
G Peeters
H Papadopoulos
H Papadopoulos
H Papadopoulos
JP Bello
K Kodera
K Lee
K Lee
KR Fitz
L Oudre
L Rabiner
LR Rabiner
M Goto
M Khadkevich
M Khadkevich
M Khadkevich
M Khadkevich
M Mauch
M Mauch
M Mauch
M Müller
M Stein
M Varewyck
Maksim Khadkevich
Maurizio Omologo
PJ Loughlin
SA Fulop
SW Hainsworth
SW Hainsworth
T Abe
T Cho
T Fujishima
T Rocher
Y Ueda
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

Author: Champion Pierre
Publication venue
Publication date: 04/09/2023
Field of study

The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. While data collection allows for the development of efficient tools powering most speech services, it also poses serious privacy issues for users as centralized storage makes private personal speech data vulnerable to cyber threats. With the increasing use of voice-based digital assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the increasing ease with which personal speech data can be collected, the risk of malicious use of voice-cloning and speaker/gender/pathological/etc. recognition has increased. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization. In this work, anonymization refers to making personal speech data unlinkable to an identity while maintaining the usefulness (utility) of the speech signal (e.g., access to linguistic content). We start by identifying several challenges that evaluation protocols need to consider to evaluate the degree of privacy protection properly. We clarify how anonymization systems must be configured for evaluation purposes and highlight that many practical deployment configurations do not permit privacy evaluation. Furthermore, we study and examine the most common voice conversion-based anonymization system and identify its weak points before suggesting new methods to overcome some limitations. We isolate all components of the anonymization system to evaluate the degree of speaker PPI associated with each of them. Then, we propose several transformation methods for each component to reduce as much as possible speaker PPI while maintaining utility. We promote anonymization algorithms based on quantization-based transformation as an alternative to the most-used and well-known noise-based approach. Finally, we endeavor a new attack method to invert anonymization.Comment: PhD Thesis Pierre Champion | Universit\'e de Lorraine - INRIA Nancy | for associated source code, see https://github.com/deep-privacy/SA-toolki

arXiv.org e-Print Archive

Analysis, implementation and evaluation of blind separation algorithms from audio sources

Author: Centeno Álvarez Rubén
Publication venue
Publication date: 01/01/2021
Field of study

El propósito de este Trabajo de Fin de Grado (TFG) sería realizar un banco de pruebas con distintos méetodos utilizados para la separacióon ciega de fuentes de audio, Blind Source Separation (BSS), que sería usado en algoritmos de localización acústica para mejorar su precisión. Bajo este propósito, se usaría la herramienta software MATLAB para implementar y evaluar los distintos sistemas propuestos. Consideraremos que la mayoría de los sistemas relacionados en la literatura científi ca se componen de tres fases: pasar al dominio tiempo-frecuencia mediante la realización de la Short Time Fourier Transform (STFT) de la mezcla de audio, separar la misma en las diferentes fuentes aplicando tecnicas Blind Source Separation (BSS), y la nal reconstrucción de las señales obtenidas y paso al dominio del tiempo utilizando la Inverse Short Time Fourier Transform (ISTFT). Con el n de alcanzar los objetivos descritos, se ha dividido el trabajo en las siguientes tareas: diseño de los bloques encargados de realizar la STFT y la ISTFT, que serían comunes para todos los métodos BSS, desarrollo del banco de métodos BSS y de la etapa de ltrado, utilizando un ltro Wiener o similar, que también sería común a todos los métodos. Finalmente, se probaría y evaluaría el sistema completo mediante mezclas de audio obtenidas en entornos similares al que se desea aplicar el sistema para mejorar la localización de las distintas fuentes.The purpose of this bachelor thesis work will be to perform a test bench with di erent methods used for Blind Source Separation (BSS) of people's voices, to be used in acoustic localization algorithms to improve their accuracy. For this purpose, the MATLAB software tool will be used to implement and evaluate the di erent proposed systems. We will consider that most related systems in the scienti c literature have three phases: move the audio source to the time-frequency (TF) domain by perfoming the Short Time Fourier Transform (STFT) to the audio mix, separate the audio into the di erent sources applying BSS techniques, and a nal task with the obtained signals of reconstruction and transition to the time domain using the Inverse Short Time Fourier Transform (ISTFT). For reaching the objectives described, the project has been divided in the following tasks: design of the blocks responsible for carrying out the STFT and the ISTFT that will be common for all BSS methods and the development of the BSS method bank and of the ltering stage, using a Wiener lter or similar, which will also be common to all methods. Last of all, testing and evaluation of the complete system using audio mixes obtained in similar environments to the one in which the system is to be applied to improve the location of the various sources.Grado en Ingeniería en Tecnologías de Telecomunicació

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Articulatory features for conversational speech recognition

Author: Metze Florian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

KITopen