Search CORE

2,647 research outputs found

Stimulated training for automatic speech recognition and keyword search in limited resource conditions

Author: Gales MJF
Knill KM
Ragni A
Vasilakes J
Wu C
Publication venue: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication date: 19/06/2017
Field of study

© 2017 IEEE. Training neural network acoustic models on limited quantities of data is a challenging task. A number of techniques have been proposed to improve generalisation. This paper investigates one such technique called stimulated training. It enables standard criteria such as cross-entropy to enforce spatial constraints on activations originating from different units. Having different regions being active depending on the input unit may help network to discriminate better and as a consequence yield lower error rates. This paper investigates stimulated training for automatic speech recognition of a number of languages representing different families, alphabets, phone sets and vocabulary sizes. In particular, it looks at ensembles of stimulated networks to ensure that improved generalisation will withstand system combination effects. In order to assess stimulated training beyond 1-best transcription accuracy, this paper looks at keyword search as a proxy for assessing quality of lattices. Experiments are conducted on IARPA Babel program languages including the surprise language of OpenKWS 2016 competition

Crossref

Apollo (Cambridge)

White Rose Research Online

Recommended from our members

Improving Interpretability and Regularization in Deep Learning

Author: Gales MJF
Karanasou P
Ragni A
Sim KC
Wu C
Publication venue: IEEE/ACM Transactions on Audio Speech and Language Processing
Publication date: 01/01/2018
Field of study

IEEE Deep learning approaches yield state-of-the-art performance in a range of tasks, including automatic speech recognition. However, the highly distributed representation in a deep neural network (DNN) or other network variations are difficult to analyse, making further parameter interpretation and regularisation challenging. This paper presents a regularisation scheme acting on the activation function output to improve the network interpretability and regularisation. The proposed approach, referred to as activation regularisation, encourages activation function outputs to satisfy a target pattern. By defining appropriate target patterns, different learning concepts can be imposed on the network. This method can aid network interpretability and also has the potential to reduce over-fitting. The scheme is evaluated on several continuous speech recognition tasks: the Wall Street Journal continuous speech recognition task, eight conversational telephone speech tasks from the IARPA Babel program and a U.S. English broadcast news task. On all the tasks, the activation regularisation achieved consistent performance gains over the standard DNN baselines

Apollo (Cambridge)

White Rose Research Online

Improving interpretability and regularization in deep learning

Author: Gales M.J.F.
Karanasou P.
Ragni A.
Sim K.C.
Wu C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Deep learning approaches yield state-of-the-art performance in a range of tasks, including automatic speech recognition. However, the highly distributed representation in a deep neural network (DNN) or other network variations is difficult to analyze, making further parameter interpretation and regularization challenging. This paper presents a regularization scheme acting on the activation function output to improve the network interpretability and regularization. The proposed approach, referred to as activation regularization, encourages activation function outputs to satisfy a target pattern. By defining appropriate target patterns, different learning concepts can be imposed on the network. This method can aid network interpretability and also has the potential to reduce overfitting. The scheme is evaluated on several continuous speech recognition tasks: the Wall Street Journal continuous speech recognition task, eight conversational telephone speech tasks from the IARPA Babel program and a U.S. English broadcast news task. On all the tasks, the activation regularization achieved consistent performance gains over the standard DNN baselines

OPUS Augsburg

Crossref

White Rose Research Online

Attentive filtering networks for audio replay attack detection

Author: Abad Alberto
Dehak Najim
King Simon
Lai Cheng-I
Richmond Korin
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/10/2018
Field of study

An attacker may use a variety of techniques to fool an automatic speaker verification system into accepting them as a genuine user. Anti-spoofing methods meanwhile aim to make the system robust against such attacks. The ASVspoof 2017 Challenge focused specifically on replay attacks, with the intention of measuring the limits of replay attack detection as well as developing countermeasures against them. In this work, we propose our replay attacks detection system - Attentive Filtering Network, which is composed of an attention-based filtering mechanism that enhances feature representations in both the frequency and time domains, and a ResNet-based classifier. We show that the network enables us to visualize the automatically acquired feature representations that are helpful for spoofing detection. Attentive Filtering Network attains an evaluation EER of 8.99

\%

on the ASVspoof 2017 Version 2.0 dataset. With system fusion, our best system further obtains a 30

\%

relative improvement over the ASVspoof 2017 enhanced baseline system.Comment: Submitted to ICASSP 201

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Spoken Term Detection on Low Resource Languages

Author: Reddy B Naresh
Publication venue
Publication date: 01/01/2015
Field of study

Developing efficient speech processing systems for low-resource languages is an immensely challenging problem. One potentially effective approach to address the lack of resources for any particular language, is to employ data from multiple languages for building speech processing sub-systems. This thesis investigates possible methodologies for Spoken Term Detection (STD) from low- resource Indian languages. The task of STD intend to search for a query keyword, given in text form, from a considerably large speech database. This is usually done by matching templates of feature vectors, representing sequence of phonemes from the query word and the continuous speech from the database. Typical set of features used to represent speech signals in most of the speech processing systems are the mel frequency cepstral coefficients (MFCC). As speech is a very complexsignal, holding information about the textual message, speaker identity, emotional and health state of the speaker, etc., the MFCC features derived from it will also contain information about all these factors. For eficient template matching, we need to neutralize the speaker variability in features and stabilize them to represent the speech variability alone

Research Archive of Indian Institute of Technology Hyderabad

Cross-language evaluation forum : objectives, results, achievements

Author: Braschler Martin
Peters Carol
Publication venue: Springer
Publication date: 01/01/2004
Field of study

ZHAW digitalcollection

Engineering data compendium. Human perception and performance. User's guide

Author: Boff Kenneth R.
Lincoln Janet E.
Publication venue
Publication date
Field of study

The concept underlying the Engineering Data Compendium was the product of a research and development program (Integrated Perceptual Information for Designers project) aimed at facilitating the application of basic research findings in human performance to the design and military crew systems. The principal objective was to develop a workable strategy for: (1) identifying and distilling information of potential value to system design from the existing research literature, and (2) presenting this technical information in a way that would aid its accessibility, interpretability, and applicability by systems designers. The present four volumes of the Engineering Data Compendium represent the first implementation of this strategy. This is the first volume, the User's Guide, containing a description of the program and instructions for its use

NASA Technical Reports Server