Search CORE

236 research outputs found

Patrol team language identification system for DARPA RATS P1 evaluation

Author: D'haro Enríquez Luis Fernando
Dehak Najim
Glembek Ondřej
Grézl František
Ma Jeff
Matsoukas Spyros
Matějka Pavel
Plchot Oldřich
Souﬁfar Mehdi
Veselý Karel
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2012
Field of study

This paper describes the language identification (LID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We show that techniques originally developed for LID on telephone speech (e.g., for the NIST language recognition evaluations) remain effective on the noisy RATS data, provided that careful consideration is applied when designing the training and development sets. In addition, we show significant improvements from the use of Wiener filtering, neural network based and language dependent i-vector modeling, and fusion

Archivo Digital UPM

Adversarial Network Bottleneck Features for Noise Robust Speaker Verification

Author: Guo Jun
Ma Zhanyu
Tan Zheng-Hua
Yu Hong
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we propose a noise robust bottleneck feature representation which is generated by an adversarial network (AN). The AN includes two cascade connected networks, an encoding network (EN) and a discriminative network (DN). Mel-frequency cepstral coefficients (MFCCs) of clean and noisy speech are used as input to the EN and the output of the EN is used as the noise robust feature. The EN and DN are trained in turn, namely, when training the DN, noise types are selected as the training labels and when training the EN, all labels are set as the same, i.e., the clean speech label, which aims to make the AN features invariant to noise and thus achieve noise robustness. We evaluate the performance of the proposed feature on a Gaussian Mixture Model-Universal Background Model based speaker verification system, and make comparison to MFCC features of speech enhanced by short-time spectral amplitude minimum mean square error (STSA-MMSE) and deep neural network-based speech enhancement (DNN-SE) methods. Experimental results on the RSR2015 database show that the proposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSE and DNN-SE based MFCCs for different noise types and signal-to-noise ratios. Furthermore, the AN-BN feature is able to improve the speaker verification performance under the clean condition

arXiv.org e-Print Archive

Crossref

VBN

Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection

Author: Chen Liming
Fanioudakis Eleftherios
Giakoumis Dimitrios
Hamzaoui Raouf
Potamitis Ilyas
Tzovaras Dimitrios
Vafeiadis Anastasios
Votis Konstantinos
Publication venue: 'International Speech Communication Association'
Publication date: 17/06/2019
Field of study

Speech Activity Detection (SAD) plays an important role in mobile communications and automatic speech recognition (ASR). Developing efficient SAD systems for real-world applications is a challenging task due to the presence of noise. We propose a new approach to SAD where we treat it as a two-dimensional multilabel image classification problem. To classify the audio segments, we compute their Short-time Fourier Transform spectrograms and classify them with a Convolutional Recurrent Neural Network (CRNN), traditionally used in image recognition. Our CRNN uses a sigmoid activation function, max-pooling in the frequency domain, and a convolutional operation as a moving average filter to remove misclassified spikes. On the development set of Task 1 of the 2019 Fearless Steps Challenge, our system achieved a decision cost function (DCF) of 2.89%, a 66.4% improvement over the baseline. Moreover, it achieved a DCF score of 3.318% on the evaluation dataset of the challenge, ranking first among all submissions

Crossref

De Montfort University Open Research Archive

On the use of i-vector posterior distributions in Probabilistic Linear Discriminant Analysis

Author: Cumani Sandro
Laface Pietro
Plchot O.
Publication venue: IEEE - INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication date: 01/01/2014
Field of study

The i-vector extraction process is affected by several factors such as the noise level, the acoustic content of the observed features, the channel mismatch between the training conditions and the test data, and the duration of the analyzed speech segment. These factors influence both the i-vector estimate and its uncertainty, represented by the i-vector posterior covariance. This paper presents a new PLDA model that, unlike the standard one, exploits the intrinsic i-vector uncertainty. Since the recognition accuracy is known to decrease for short speech segments, and their length is one of the main factors affecting the i-vector covariance, we designed a set of experiments aiming at comparing the standard and the new PLDA models on short speech cuts of variable duration, randomly extracted from the conversations included in the NIST SRE 2010 extended dataset, both from interviews and telephone conversations. Our results on NIST SRE 2010 evaluation data show that in different conditions the new model outperforms the standard PLDA by more than 10% relative when tested on short segments with duration mismatches, and is able to keep the accuracy of the standard model for long enough speaker segments. This technique has also been successfully tested in the NIST SRE 2012 evaluation

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Fast Scoring of Full Posterior PLDA Models

Author: Cumani Sandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Text-Independent Speaker Identification Using the Histogram Transform Model

Author: Guo Jun
Ma Zhanyu
Tan Zheng-Hua
Yu Hong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

VBN

Workshop on Extracting and Using Constructions in Computational Linguistics

Author: Knutsson Ola
Sahlgren Magnus
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive