1,513 research outputs found
SVMs for Automatic Speech Recognition: a Survey
Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact.
During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed.
These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research
Advances in the application of support vector machines as probabilistic estimators for continuous automatic speech recognition
Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, noviembre de 200
End-to-End Multiview Gesture Recognition for Autonomous Car Parking System
The use of hand gestures can be the most intuitive human-machine interaction medium.
The early approaches for hand gesture recognition used device-based methods. These
methods use mechanical or optical sensors attached to a glove or markers, which hinders
the natural human-machine communication. On the other hand, vision-based methods are
not restrictive and allow for a more spontaneous communication without the need of an
intermediary between human and machine. Therefore, vision gesture recognition has been
a popular area of research for the past thirty years.
Hand gesture recognition finds its application in many areas, particularly the automotive
industry where advanced automotive human-machine interface (HMI) designers are
using gesture recognition to improve driver and vehicle safety. However, technology advances
go beyond active/passive safety and into convenience and comfort. In this context,
one of America’s big three automakers has partnered with the Centre of Pattern Analysis
and Machine Intelligence (CPAMI) at the University of Waterloo to investigate expanding
their product segment through machine learning to provide an increased driver convenience
and comfort with the particular application of hand gesture recognition for autonomous
car parking.
In this thesis, we leverage the state-of-the-art deep learning and optimization techniques
to develop a vision-based multiview dynamic hand gesture recognizer for self-parking system.
We propose a 3DCNN gesture model architecture that we train on a publicly available
hand gesture database. We apply transfer learning methods to fine-tune the pre-trained
gesture model on a custom-made data, which significantly improved the proposed system
performance in real world environment. We adapt the architecture of the end-to-end solution
to expand the state of the art video classifier from a single image as input (fed by
monocular camera) to a multiview 360 feed, offered by a six cameras module. Finally, we
optimize the proposed solution to work on a limited resources embedded platform (Nvidia
Jetson TX2) that is used by automakers for vehicle-based features, without sacrificing the
accuracy robustness and real time functionality of the system
Support vector machines to detect physiological patterns for EEG and EMG-based human-computer interaction:a review
Support vector machines (SVMs) are widely used classifiers for detecting physiological patterns in human-computer interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the applications of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported
Sarcasm recognition survey and application based on Reddit comments
Social media platforms are continuously increasing their number of users, and every day enormous amounts of data are produced online. Machine Learning (ML) techniques in the form of speech recognition are applied to analyze the polarity of this unstructured text data.
However, it is broadly used sarcasm through these platforms, reducing the accuracy of said systems, as the intention of the message expressed does not match the polarity that is measured.
Throughout the development of this work a survey considering three different algorithms will be performed. These algorithms are Logistic Regression, Neural Networks and Support Vector Machines.
This final degree project proposes a previous analysis to the data using a sarcasm recognition classifier implemented with a support vector machine algorithm, with a mean accuracy of 71.21% and an F1-Score around 60%.
Finally, an analysis of the planification and the costs is performed, proposing future works that could complement this bachelor thesis.IngenierĂa de la EnergĂ
A Comparative Study of Machine Learning Models for Tabular Data Through Challenge of Monitoring Parkinson's Disease Progression Using Voice Recordings
People with Parkinson's disease must be regularly monitored by their
physician to observe how the disease is progressing and potentially adjust
treatment plans to mitigate the symptoms. Monitoring the progression of the
disease through a voice recording captured by the patient at their own home can
make the process faster and less stressful. Using a dataset of voice recordings
of 42 people with early-stage Parkinson's disease over a time span of 6 months,
we applied multiple machine learning techniques to find a correlation between
the voice recording and the patient's motor UPDRS score. We approached this
problem using a multitude of both regression and classification techniques.
Much of this paper is dedicated to mapping the voice data to motor UPDRS scores
using regression techniques in order to obtain a more precise value for unknown
instances. Through this comparative study of variant machine learning methods,
we realized some old machine learning methods like trees outperform cutting
edge deep learning models on numerous tabular datasets.Comment: Accepted at "HIMS'20 - The 6th Int'l Conf on Health Informatics and
Medical Systems"; https://americancse.org/events/csce2020/conferences/hims2
Voice Assessments for Detecting Patients with Parkinson’s Diseases in Different Stages
Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to detect patients with Parkinson’s disease (PD). So we have computed 19 dysphonia measures from sustained vowels collected from 375 voice samples from healthy and people suffer from PD. All the features are analysed and the more relevant ones are selected by the Principal component analysis (PCA) to classify the subjects in 4 classes according to the UPDRS (unified Parkinson’s disease Rating Scale) score. We used k-folds cross validation method with (k=4) validation scheme; 75% for training and 25% for testing, along with the Support Vector Machines (SVM) with its different types of kernels. The best result obtained was 92.5% using the PCA and the linear SVM
- …