2,437 research outputs found
A Speech Recognizer based on Multiclass SVMs with HMM-Guided Segmentation
Automatic Speech Recognition (ASR) is essentially a problem of pattern
classification, however, the time dimension of the speech signal has
prevented to pose ASR as a simple static classification problem. Support
Vector Machine (SVM) classifiers could provide an appropriate solution,
since they are very well adapted to high-dimensional classification problems.
Nevertheless, the use of SVMs for ASR is by no means straightforward,
mainly because SVM classifiers require an input of fixed-dimension.
In this paper we study the use of a HMM-based segmentation as a mean to
get the fixed-dimension input vectors required by SVMs, in a problem of
isolated-digit recognition. Different configurations for all the parameters
involved have been tested. Also, we deal with the problem of multi-class
classification (as SVMs are initially binary classifers), studying two of the
most popular approaches: 1-vs-all and 1-vs-1
Benchmark of structured machine learning methods for microbial identification from mass-spectrometry data
Microbial identification is a central issue in microbiology, in particular in
the fields of infectious diseases diagnosis and industrial quality control. The
concept of species is tightly linked to the concept of biological and clinical
classification where the proximity between species is generally measured in
terms of evolutionary distances and/or clinical phenotypes. Surprisingly, the
information provided by this well-known hierarchical structure is rarely used
by machine learning-based automatic microbial identification systems.
Structured machine learning methods were recently proposed for taking into
account the structure embedded in a hierarchy and using it as additional a
priori information, and could therefore allow to improve microbial
identification systems. We test and compare several state-of-the-art machine
learning methods for microbial identification on a new Matrix-Assisted Laser
Desorption/Ionization Time-of-Flight mass spectrometry (MALDI-TOF MS) dataset.
We include in the benchmark standard and structured methods, that leverage the
knowledge of the underlying hierarchical structure in the learning process. Our
results show that although some methods perform better than others, structured
methods do not consistently perform better than their "flat" counterparts. We
postulate that this is partly due to the fact that standard methods already
reach a high level of accuracy in this context, and that they mainly confuse
species close to each other in the tree, a case where using the known hierarchy
is not helpful
SVMs for Automatic Speech Recognition: a Survey
Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact.
During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed.
These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research
Detection of Dispersed Radio Pulses: A machine learning approach to candidate identification and classification
Searching for extraterrestrial, transient signals in astronomical data sets
is an active area of current research. However, machine learning techniques are
lacking in the literature concerning single-pulse detection. This paper
presents a new, two-stage approach for identifying and classifying dispersed
pulse groups (DPGs) in single-pulse search output. The first stage identified
DPGs and extracted features to characterize them using a new peak
identification algorithm which tracks sloping tendencies around local maxima in
plots of signal-to-noise ratio vs. dispersion measure. The second stage used
supervised machine learning to classify DPGs. We created four benchmark data
sets: one unbalanced and three balanced versions using three different
imbalance treatments.We empirically evaluated 48 classifiers by training and
testing binary and multiclass versions of six machine learning algorithms on
each of the four benchmark versions. While each classifier had advantages and
disadvantages, all classifiers with imbalance treatments had higher recall
values than those with unbalanced data, regardless of the machine learning
algorithm used. Based on the benchmarking results, we selected a subset of
classifiers to classify the full, unlabelled data set of over 1.5 million DPGs
identified in 42,405 observations made by the Green Bank Telescope. Overall,
the classifiers using a multiclass ensemble tree learner in combination with
two oversampling imbalance treatments were the most efficient; they identified
additional known pulsars not in the benchmark data set and provided six
potential discoveries, with significantly less false positives than the other
classifiers.Comment: 13 pages, accepted for publication in MNRAS, ref. MN-15-1713-MJ.R
Robust ASR using Support Vector Machines
The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units.
In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad
- …