Search CORE

558 research outputs found

SVMs for Automatic Speech Recognition: a Survey

Author: Díaz de María Fernando
Gallardo Antolín Ascensión
Martín Iglesias D.
Padrell Sendra J.
Peláez Moreno Carmen
Solera Ureña R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed. These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Robust ASR using Support Vector Machines

Author: A. Gallardo-Antolín
Allwein
Bengio
Bourlard
Burges
C. Peláez-Moreno
Clarkson
Crammer
D. Martín-Iglesias
F. Díaz-de-María
Fürnkranz
Ganapathiraju
Glass
Hsu
Jiang
Joachims
Navia-Vázquez
R. Solera-Ureña
Rabiner
Schölkopf
Shimodaira
Thubthong
Trentin
Vapnik
Vapnik
Vicente-Peña
Weiss
Wu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units. In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad Carlos III de Madrid e-Archivo

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

Author: Borth Damian
Elizalde Benjamin
Lane Ian
Raj Bhiksha
Sager Sebastian
Schulze Christian
Publication venue
Publication date: 08/01/2018
Field of study

Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in this work we investigate the relation between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1,123 pairs and over 33,000 audio files. One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels. A second contribution is to show the degree of correlation between the audio content and the labels through sound recognition experiments, which yielded results of 70% accuracy, hence also providing a performance benchmark. The results and study in this paper encourage further exploration of the nuances in audio and are meant to complement similar research performed on images and text in multimedia analysis.Comment: This paper is a revised version of "AudioSentibank: Large-scale Semantic Ontology of Acoustic Concepts for Audio Content Analysis

arXiv.org e-Print Archive

Directory of Open Access Journals

On the Existence of an MVU Estimator for Target Localization with Censored, Noise Free Binary Detectors

Author: Seyedi Alireza
Shoari Arian
Publication venue
Publication date: 18/05/2015
Field of study

The problem of target localization with censored noise free binary detectors is considered. In this setting only the detecting sensors report their locations to the fusion center. It is proven that if the radius of detection is not known to the fusion center, a minimum variance unbiased (MVU) estimator does not exist. Also it is shown that when the radius is known the center of mass of the possible target region is the MVU estimator. In addition, a sub-optimum estimator is introduced whose performance is close to the MVU estimator but is preferred computationally. Furthermore, minimal sufficient statistics have been provided, both when the detection radius is known and when it is not. Simulations confirmed that the derived MVU estimator outperforms several heuristic location estimators.Comment: 25 pages, 9 figure

arXiv.org e-Print Archive

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

A time-frequency approach to blind separation of under-determined mixture of sources

Author: Kawamoto M.
Mansour Ali
Puntonet C.
Publication venue: IASTED
Publication date: 01/01/2003
Field of study

espace@Curtin

Recommended from our members

Operating System Based Perceptual Evaluation of Call Quality in Radio Telecommunications Networks. Development of call quality assessment at mobile terminals using the Symbian operating system, comparison with traditional approaches and proposals for a tariff regime relating call charging to perceived speech quality.

Author: Aburas Akram
Publication venue: School of Engineering, Design and Technology
Publication date: 01/01/2012
Field of study

Call quality has been crucial from the inception of telecommunication networks. Operators need to monitor call quality from the end-user¿s perspective, in order to retain subscribers and reduce subscriber ¿churn¿. Operators worry not only about call quality and interconnect revenue loss, but also about network connectivity issues in areas where mobile network gateways are prevalent. Bandwidth quality as experienced by the end-user is equally important in helping operators to reduce churn. The parameters that network operators use to improve call quality are mainly from the end-user¿s perspective. These parameters are usually ASR (answer seizure ratio), PDD (postdial delay), NER (network efficiency ratio), the number of calls for which these parameters have been analyzed and successful calls. Operators use these parameters to evaluate and optimize the network to meet their quality requirements. Analysis of speech quality is a major arena for research. Traditionally, users¿ perception of speech quality has been measured offline using subjective listening tests. Such tests are, however, slow, tedious and costly. An alternative method is therefore needed; one that can be automatically computed on the subscriber¿s handset, be available to the operator as well as to subscribers and, at the same time, provide results that are comparable with conventional subjective scores. QMeter® ¿ a set of tools for signal and bandwidth measurement that have been developed bearing in mind all the parameters that influence call and bandwidth quality experienced by the end-user ¿ addresses these issues and, additionally, facilitates dynamic tariff propositions which enhance the credibility of the operator. This research focuses on call quality parameters from the end-user¿s perspective. The call parameters used in the research are signal strength, successful call rate, normal drop call rate, and hand-over drop rate. Signal strength is measured for every five milliseconds of an active call and average signal strength is calculated for each successful call. The successful call rate, normal drop rate and hand-over drop rate are used to achieve a measurement of the overall call quality. Call quality with respect to bundles of 10 calls is proposed. An attempt is made to visualize these parameters for better understanding of where the quality is bad, good and excellent. This will help operators, as well as user groups, to measure quality and coverage. Operators boast about their bandwidth but in reality, to know the locations where speed has to be improved, they need a tool that can effectively measure speed from the end-user¿s perspective. BM (bandwidth meter), a tool developed as a part of this research, measures the average speed of data sessions and stores the information for analysis at different locations. To address issues of quality in the subscriber segment, this research proposes the varying of tariffs based on call and bandwidth quality. Call charging based on call quality as perceived by the end-user is proposed, both to satisfy subscribers and help operators to improve customer satisfaction and increase average revenue per user. Tariff redemption procedures are put forward for bundles of 10 calls and 10 data sessions. In addition to the varying of tariffs, quality escalation processes are proposed. Deploying such tools on selected or random samples of users will result in substantial improvement in user loyalty which, in turn, will bring operational and economic advantages

Bradford Scholars