2,484 research outputs found

    Robust ASR using Support Vector Machines

    Get PDF
    The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units. In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad

    SVMs for Automatic Speech Recognition: a Survey

    Get PDF
    Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed. These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research

    A review of domain adaptation without target labels

    Full text link
    Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: how can a classifier learn from a source domain and generalize to a target domain? We present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods revolve around on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods incorporate adaptation into the parameter estimation procedure, for instance through constraints on the optimization procedure. Additionally, we review a number of conditions that allow for formulating bounds on the cross-domain generalization error. Our categorization highlights recurring ideas and raises questions important to further research.Comment: 20 pages, 5 figure

    Robust Sound Event Classification using Deep Neural Networks

    Get PDF
    The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques

    Locality and Structure Regularized Low Rank Representation for Hyperspectral Image Classification

    Full text link
    Hyperspectral image (HSI) classification, which aims to assign an accurate label for hyperspectral pixels, has drawn great interest in recent years. Although low rank representation (LRR) has been used to classify HSI, its ability to segment each class from the whole HSI data has not been exploited fully yet. LRR has a good capacity to capture the underlying lowdimensional subspaces embedded in original data. However, there are still two drawbacks for LRR. First, LRR does not consider the local geometric structure within data, which makes the local correlation among neighboring data easily ignored. Second, the representation obtained by solving LRR is not discriminative enough to separate different data. In this paper, a novel locality and structure regularized low rank representation (LSLRR) model is proposed for HSI classification. To overcome the above limitations, we present locality constraint criterion (LCC) and structure preserving strategy (SPS) to improve the classical LRR. Specifically, we introduce a new distance metric, which combines both spatial and spectral features, to explore the local similarity of pixels. Thus, the global and local structures of HSI data can be exploited sufficiently. Besides, we propose a structure constraint to make the representation have a near block-diagonal structure. This helps to determine the final classification labels directly. Extensive experiments have been conducted on three popular HSI datasets. And the experimental results demonstrate that the proposed LSLRR outperforms other state-of-the-art methods.Comment: 14 pages, 7 figures, TGRS201
    corecore