2,908 research outputs found
Speaker recognition by means of restricted Boltzmann machine adaptation
Restricted Boltzmann Machines (RBMs) have shown success in speaker recognition. In this paper, RBMs are investigated in a framework comprising a universal model training and model adaptation. Taking advantage of RBM unsupervised learning algorithm, a global model is trained based on all available background data. This general speaker-independent model, referred to as URBM, is further adapted to the data of a specific speaker to build speaker-dependent model. In order to show its effectiveness, we have applied this framework to two different tasks. It has been used to discriminatively model target and impostor spectral features for classification. It has been also utilized to produce a vector-based representation for speakers. This vector-based representation, similar to i-vector, can be further used for speaker recognition using either cosine scoring or Probabilistic Linear Discriminant Analysis (PLDA). The evaluation is performed on the core test condition of the NIST SRE 2006 database.Peer ReviewedPostprint (author's final draft
Anti-spoofing Methods for Automatic SpeakerVerification System
Growing interest in automatic speaker verification (ASV)systems has lead to
significant quality improvement of spoofing attackson them. Many research works
confirm that despite the low equal er-ror rate (EER) ASV systems are still
vulnerable to spoofing attacks. Inthis work we overview different acoustic
feature spaces and classifiersto determine reliable and robust countermeasures
against spoofing at-tacks. We compared several spoofing detection systems,
presented so far,on the development and evaluation datasets of the Automatic
SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge
2015.Experimental results presented in this paper demonstrate that the useof
magnitude and phase information combination provides a substantialinput into
the efficiency of the spoofing detection systems. Also wavelet-based features
show impressive results in terms of equal error rate. Inour overview we compare
spoofing performance for systems based on dif-ferent classifiers. Comparison
results demonstrate that the linear SVMclassifier outperforms the conventional
GMM approach. However, manyresearchers inspired by the great success of deep
neural networks (DNN)approaches in the automatic speech recognition, applied
DNN in thespoofing detection task and obtained quite low EER for known and
un-known type of spoofing attacks.Comment: 12 pages, 0 figures, published in Springer Communications in Computer
and Information Science (CCIS) vol. 66
VoxCeleb2: Deep Speaker Recognition
The objective of this paper is speaker recognition under noisy and
unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale
audio-visual speaker recognition dataset collected from open-source media.
Using a fully automated pipeline, we curate VoxCeleb2 which contains over a
million utterances from over 6,000 speakers. This is several times larger than
any publicly available speaker recognition dataset.
Second, we develop and compare Convolutional Neural Network (CNN) models and
training strategies that can effectively recognise identities from voice under
various conditions. The models trained on the VoxCeleb2 dataset surpass the
performance of previous works on a benchmark dataset by a significant margin.Comment: To appear in Interspeech 2018. The audio-visual dataset can be
downloaded from http://www.robots.ox.ac.uk/~vgg/data/voxceleb2 .
1806.05622v2: minor fixes; 5 page
Automated Testing of Speech-to-Speech Machine Translation in Telecom Networks
Globalisoituvassa maailmassa kyky kommunikoida kielimuurien yli käy yhä tärkeämmäksi. Kielten opiskelu on työlästä ja siksi halutaan kehittää automaattisia konekäännösjärjestelmiä. Ericsson on kehittänyt prototyypin nimeltä Real-Time Interpretation System (RTIS), joka toimii mobiiliverkossa ja kääntää matkailuun liittyviä fraaseja puhemuodossa kahden kielen välillä.
Nykyisten konekäännösjärjestelmien suorituskyky on suhteellisen huono ja siksi testauksella on suuri merkitys järjestelmien suunnittelussa. Testauksen tarkoituksena on varmistaa, että järjestelmä säilyttää käännösekvivalenssin sekä puhekäännösjärjestelmän tapauksessa myös riittävän puheenlaadun. Luotettavimmin testaus voidaan suorittaa ihmisten antamiin arviointeihin perustuen, mutta tällaisen testauksen kustannukset ovat suuria ja tulokset subjektiivisia.
Tässä työssä suunniteltiin ja analysoitiin automatisoitu testiympäristö Real-Time Interpretation System -käännösprototyypille. Tavoitteina oli tutkia, voidaanko testaus suorittaa automatisoidusti ja pystytäänkö todellinen, käyttäjän havaitsema käännösten laatu mittaamaan automatisoidun testauksen keinoin.
Tulokset osoittavat että mobiiliverkoissa puheenlaadun testaukseen käytetyt menetelmät eivät ole optimaalisesti sovellettavissa konekäännösten testaukseen. Nykytuntemuksen mukaan ihmisten suorittama arviointi on ainoa luotettava tapa mitata käännösekvivalenssia ja puheen ymmärrettävyyttä. Konekäännösten testauksen automatisointi vaatii lisää tutkimusta, jota ennen subjektiivinen arviointi tulisi säilyttää ensisijaisena testausmenetelmänä RTIS-testauksessa.In the globalizing world, the ability to communicate over language barriers is increasingly important. Learning languages is laborious, which is why there is a strong desire to develop automatic machine translation applications. Ericsson has developed a speech-to-speech translation prototype called the Real-Time Interpretation System (RTIS). The service runs in a mobile network and translates travel phrases between two languages in speech format.
The state-of-the-art machine translation systems suffer from a relatively poor performance and therefore evaluation plays a big role in machine translation development. The purpose of evaluation is to ensure the system preserves the translational equivalence, and in case of a speech-to-speech system, the speech quality. The evaluation is most reliably done by human judges. However, human-conducted evaluation is costly and subjective.
In this thesis, a test environment for Ericsson Real-Time Interpretation System prototype is designed and analyzed. The goals are to investigate if the RTIS verification can be conducted automatically, and if the test environment can truthfully measure the end-to-end performance of the system.
The results conclude that methods used in end-to-end speech quality verification in mobile networks can not be optimally adapted for machine translation evaluation. With current knowledge, human-conducted evaluation is the only method that can truthfully measure translational equivalence and the speech intelligibility. Automating machine translation evaluation needs further research, until which human-conducted evaluation should remain the preferred method in RTIS verification
- …