34 research outputs found

    SVMs for Automatic Speech Recognition: a Survey

    Get PDF
    Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed. These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research

    A detection-based pattern recognition framework and its applications

    Get PDF
    The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation. Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages. A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage. This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed in more applications in the future.Ph.D.Committee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Ghovanloo, Maysam; Committee Member: Romberg, Justin; Committee Member: Yuan, Min

    Smartphone-based human activity recognition

    Get PDF
    Cotutela Universitat Politècnica de Catalunya i Università degli Studi di GenovaHuman Activity Recognition (HAR) is a multidisciplinary research field that aims to gather data regarding people's behavior and their interaction with the environment in order to deliver valuable context-aware information. It has nowadays contributed to develop human-centered areas of study such as Ambient Intelligence and Ambient Assisted Living, which concentrate on the improvement of people's Quality of Life. The first stage to accomplish HAR requires to make observations from ambient or wearable sensor technologies. However, in the second case, the search for pervasive, unobtrusive, low-powered, and low-cost devices for achieving this challenging task still has not been fully addressed. In this thesis, we explore the use of smartphones as an alternative approach for performing the identification of physical activities. These self-contained devices, which are widely available in the market, are provided with embedded sensors, powerful computing capabilities and wireless communication technologies that make them highly suitable for this application. This work presents a series of contributions regarding the development of HAR systems with smartphones. In the first place we propose a fully operational system that recognizes in real-time six physical activities while also takes into account the effects of postural transitions that may occur between them. For achieving this, we cover some research topics from signal processing and feature selection of inertial data, to Machine Learning approaches for classification. We employ two sensors (the accelerometer and the gyroscope) for collecting inertial data. Their raw signals are the input of the system and are conditioned through filtering in order to reduce noise and allow the extraction of informative activity features. We also emphasize on the study of Support Vector Machines (SVMs), which are one of the state-of-the-art Machine Learning techniques for classification, and reformulate various of the standard multiclass linear and non-linear methods to find the best trade off between recognition performance, computational costs and energy requirements, which are essential aspects in battery-operated devices such as smartphones. In particular, we propose two multiclass SVMs for activity classification:one linear algorithm which allows to control over dimensionality reduction and system accuracy; and also a non-linear hardware-friendly algorithm that only uses fixed-point arithmetic in the prediction phase and enables a model complexity reduction while maintaining the system performance. The efficiency of the proposed system is verified through extensive experimentation over a HAR dataset which we have generated and made publicly available. It is composed of inertial data collected from a group of 30 participants which performed a set of common daily activities while carrying a smartphone as a wearable device. The results achieved in this research show that it is possible to perform HAR in real-time with a precision near 97\% with smartphones. In this way, we can employ the proposed methodology in several higher-level applications that require HAR such as ambulatory monitoring of the disabled and the elderly during periods above five days without the need of a battery recharge. Moreover, the proposed algorithms can be adapted to other commercial wearable devices recently introduced in the market (e.g. smartwatches, phablets, and glasses). This will open up new opportunities for developing practical and innovative HAR applications.El Reconocimiento de Actividades Humanas (RAH) es un campo de investigación multidisciplinario que busca recopilar información sobre el comportamiento de las personas y su interacción con el entorno con el propósito de ofrecer información contextual de alta significancia sobre las acciones que ellas realizan. Recientemente, el RAH ha contribuido en el desarrollo de áreas de estudio enfocadas a la mejora de la calidad de vida del hombre tales como: la inteligència ambiental (Ambient Intelligence) y la vida cotidiana asistida por el entorno para personas dependientes (Ambient Assisted Living). El primer paso para conseguir el RAH consiste en realizar observaciones mediante el uso de sensores fijos localizados en el ambiente, o bien portátiles incorporados de forma vestible en el cuerpo humano. Sin embargo, para el segundo caso, aún se dificulta encontrar dispositivos poco invasivos, de bajo consumo energético, que permitan ser llevados a cualquier lugar, y de bajo costo. En esta tesis, nosotros exploramos el uso de teléfonos móviles inteligentes (Smartphones) como una alternativa para el RAH. Estos dispositivos, de uso cotidiano y fácilmente asequibles en el mercado, están dotados de sensores embebidos, potentes capacidades de cómputo y diversas tecnologías de comunicación inalámbrica que los hacen apropiados para esta aplicación. Nuestro trabajo presenta una serie de contribuciones en relación al desarrollo de sistemas para el RAH con Smartphones. En primera instancia proponemos un sistema que permite la detección de seis actividades físicas en tiempo real y que, además, tiene en cuenta las transiciones posturales que puedan ocurrir entre ellas. Con este fin, hemos contribuido en distintos ámbitos que van desde el procesamiento de señales y la selección de características, hasta algoritmos de Aprendizaje Automático (AA). Nosotros utilizamos dos sensores inerciales (el acelerómetro y el giroscopio) para la captura de las señales de movimiento de los usuarios. Estas han de ser procesadas a través de técnicas de filtrado para la reducción de ruido, segmentación y obtención de características relevantes en la detección de actividad. También hacemos énfasis en el estudio de Máquinas de soporte vectorial (MSV) que son uno de los algoritmos de AA más usados en la actualidad. Para ello reformulamos varios de sus métodos estándar (lineales y no lineales) con el propósito de encontrar la mejor combinación de variables que garanticen un buen desempeño del sistema en cuanto a precisión, coste computacional y requerimientos de energía, los cuales son aspectos esenciales en dispositivos portátiles con suministro de energía mediante baterías. En concreto, proponemos dos MSV multiclase para la clasificación de actividad: un algoritmo lineal que permite el balance entre la reducción de la dimensionalidad y la precisión del sistema; y asimismo presentamos un algoritmo no lineal conveniente para dispositivos con limitaciones de hardware que solo utiliza aritmética de punto fijo en la fase de predicción y que permite reducir la complejidad del modelo de aprendizaje mientras mantiene el rendimiento del sistema. La eficacia del sistema propuesto es verificada a través de una experimentación extensiva sobre la base de datos RAH que hemos generado y hecho pública en la red. Esta contiene la información inercial obtenida de un grupo de 30 participantes que realizaron una serie de actividades de la vida cotidiana en un ambiente controlado mientras tenían sujeto a su cintura un smartphone que capturaba su movimiento. Los resultados obtenidos en esta investigación demuestran que es posible realizar el RAH en tiempo real con una precisión cercana al 97%. De esta manera, podemos emplear la metodología propuesta en aplicaciones de alto nivel que requieran el RAH tales como monitorizaciones ambulatorias para personas dependientes (ej. ancianos o discapacitados) durante periodos mayores a cinco días sin la necesidad de recarga de baterías.Postprint (published version

    Design of reservoir computing systems for the recognition of noise corrupted speech and handwriting

    Get PDF

    Child's play: activity recognition for monitoring children's developmental progress with augmented toys

    Get PDF
    The way in which infants play with objects can be indicative of their developmental progress and may serve as an early indicator for developmental delays. However, the observation of children interacting with toys for the purpose of quantitative analysis can be a difficult task. To better quantify how play may serve as an early indicator, researchers have conducted retrospective studies examining the differences in object play behaviors among infants. However, such studies require that researchers repeatedly inspect videos of play often at speeds much slower than real-time to indicate points of interest. The research presented in this dissertation examines whether a combination of sensors embedded within toys and automatic pattern recognition of object play behaviors can help expedite this process. For my dissertation, I developed the Child'sPlay system which uses augmented toys and statistical models to automatically provide quantitative measures of object play interactions, as well as, provide the PlayView interface to view annotated play data for later analysis. In this dissertation, I examine the hypothesis that sensors embedded in objects can provide sufficient data for automatic recognition of certain exploratory, relational, and functional object play behaviors in semi-naturalistic environments and that a continuum of recognition accuracy exists which allows automatic indexing to be useful for retrospective review. I designed several augmented toys and used them to collect object play data from more than fifty play sessions. I conducted pattern recognition experiments over this data to produce statistical models that automatically classify children's object play behaviors. In addition, I conducted a user study with twenty participants to determine if annotations automatically generated from these models help improve performance in retrospective review tasks. My results indicate that these statistical models increase user performance and decrease perceived effort when combined with the PlayView interface during retrospective review. The presence of high quality annotations are preferred by users and promotes an increase in the effective retrieval rates of object play behaviors.Ph.D.Committee Chair: Starner, Thad E.; Committee Co-Chair: Abowd, Gregory D.; Committee Member: Arriaga, Rosa; Committee Member: Jackson, Melody Moore; Committee Member: Lukowicz, Paul; Committee Member: Rehg, James M

    Adaptive combinations of classifiers with application to on-line handwritten character recognition

    Get PDF
    Classifier combining is an effective way of improving classification performance. User adaptation is clearly another valid approach for improving performance in a user-dependent system, and even though adaptation is usually performed on the classifier level, also adaptive committees can be very effective. Adaptive committees have the distinct ability of performing adaptation without detailed knowledge of the classifiers. Adaptation can therefore be used even with classification systems that intrinsically are not suited for adaptation, whether that be due to lack of access to the workings of the classifier or simply a classification scheme not suitable for continuous learning. This thesis proposes methods for adaptive combination of classifiers in the setting of on-line handwritten character recognition. The focal part of the work introduces adaptive classifier combination schemes, of which the two most prominent ones are the Dynamically Expanding Context (DEC) committee and the Class-Confidence Critic Combining (CCCC) committee. Both have been shown to be capable of successful adaptation to the user in the task of on-line handwritten character recognition. Particularly the highly modular CCCC framework has shown impressive performance also in a doubly-adaptive setting of combining adaptive classifiers by using an adaptive committee. In support of this main topic of the thesis, some discussion on a methodology for deducing correct character labeling from user actions is presented. Proper labeling is paramount for effective adaptation, and deducing the labels from the user's actions is necessary to perform adaptation transparently to the user. In that way, the user does not need to give explicit feedback on the correctness of the recognition results. Also, an overview is presented of adaptive classification methods for single-classifier adaptation in handwritten character recognition developed at the Laboratory of Computer and Information Science of the Helsinki University of Technology, CIS-HCR. Classifiers based on the CIS-HCR system have been used in the adaptive committee experiments as both member classifiers and to provide a reference level. Finally, two distinct approaches for improving the performance of committee classifiers further are discussed. Firstly, methods for committee rejection are presented and evaluated. Secondly, measures of classifier diversity for classifier selection, based on the concept of diversity of errors, are presented and evaluated. The topic of this thesis hence covers three important aspects of pattern recognition: on-line adaptation, combining classifiers, and a practical evaluation setting of handwritten character recognition. A novel approach combining these three core ideas has been developed and is presented in the introductory text and the included publications. To reiterate, the main contributions of this thesis are: 1) introduction of novel adaptive committee classification methods, 2) introduction of novel methods for measuring classifier diversity, 3) presentation of some methods for implementing committee rejection, 4) discussion and introduction of a method for effective label deduction from on-line user actions, and as a side-product, 5) an overview of the CIS-HCR adaptive on-line handwritten character recognition system.Luokittimien yhdistäminen komitealuokittimella on tehokas keino luokitustarkkuuden parantamiseen. Laskentatehon jatkuva kasvu tekee myös useiden luokittimien yhtäaikaisesta käytöstä yhä varteenotettavamman vaihtoehdon. Järjestelmän adaptoituminen (mukautuminen) käyttäjään on toinen hyvä keino käyttäjäriippumattoman järjestelmän tarkkuuden parantantamiseksi. Vaikka adaptaatio yleensä toteutetaan luokittimen tasolla, myös adaptiiviset komitealuokittimet voivat olla hyvin tehokkaita. Adaptiiviset komiteat voivat adaptoitua ilman yksityiskohtaista tietoa jäsenluokittimista. Adaptaatiota voidaan näin käyttää myös luokittelujärjestelmissä, jotka eivät ole itsessään sopivia adaptaatioon. Adaptaatioon sopimattomuus voi johtua esimerkiksi siitä, että luokittimen totetutusta ei voida muuttaa, tai siitä, että käytetään luokittelumenetelmää, joka ei sovellu jatkuvaan oppimiseen. Tämä väitöskirja käsittelee menetelmiä luokittimien adaptiiviseen yhdistämiseen käyttäen sovelluskohteena käsinkirjoitettujen merkkien on-line-tunnistusta. Keskeisin osa työtä esittelee uusia adaptiivisia luokittimien yhdistämismenetelmiä, joista kaksi huomattavinta ovat Dynamically Expanding Context (DEC) -komitea sekä Class-Confidence Critic Combining (CCCC) -komitea. Molemmat näistä ovat osoittautuneet kykeneviksi tehokkaaseen käyttäjä-adaptaatioon käsinkirjoitettujen merkkien on-line-tunnistuksessa. Erityisesti hyvin modulaarisella CCCC järjestelmällä on saatu hyviä tuloksia myös kaksinkertaisesti adaptiivisessa asetelmassa, jossa yhdistetään adaptiivisia jäsenluokittimia adaptiivisen komitean avulla. Väitöskirjan pääteeman tukena esitetään myös malli ja käytännön esimerkki siitä, miten käyttäjän toimista merkeille voidaan päätellä oikeat luokat. Merkkien todellisen luokan onnistunut päättely on elintärkeää tehokkaalle adaptaatiolle. Jotta adaptaatio voitaisiin suorittaa käyttäjälle läpinäkyvästi, merkkien todelliset luokat on kyettävä päättelemään käyttäjän toimista. Tällä tavalla käyttäjän ei tarvitse antaa suoraa palautetta tunnistustuloksen oikeellisuudesta. Työssä esitetään myös yleiskatsaus Teknillisen korkeakoulun Informaatiotekniikan laboratoriossa kehitettyyn adaptiiviseen käsinkirjoitettujen merkkien tunnistusjärjestelmään. Tähän järjestelmään perustuvia luokittimia on käytetty adaptiivisten komitealuokittimien kokeissa sekä jäsenluokittimina että vertailutasona. Lopuksi esitellään kaksi erillistä menetelmää komitealuokittimen tarkkuuden edelleen parantamiseksi. Näistä ensimmäinen on joukko menetelmiä komitealuokittimen rejektion (hylkäyksen) toteuttamiseksi. Toinen esiteltävä menetelmä on käyttää luokittimien erilaisuuden mittoja jäsenluokittimien valintaa varten. Ehdotetut uudet erilaisuusmitat perustuvat käsitteeseen, jota kutsumme virheiden erilaisuudeksi. Väitöskirjan aihe kattaa kolme hahmontunnistuksen tärkeää osa-aluetta: online-adaptaation, luokittimien yhdistämisen ja käytännön sovellusalana käsinkirjoitettujen merkkien tunnistuksen. Näistä kolmesta lähtökohdasta on kehitetty uudenlainen synteesi, joka esitetään johdantotekstissä sekä liitteenä olevissa julkaisuissa. Tämän väitöskirjan oleellisimmat kontribuutiot ovat siten: 1) uusien adaptiivisten komitealuokittimien esittely, 2) uudenlaisten menetelmien esittely luokittimien erilaisuuden mittaamiseksi, 3) joidenkin komitearejektiomenetelmien esittely, 4) pohdinnan ja erään toteutustavan esittely syötettyjen merkkien todellisen luokan päättelemiseksi käyttäjän toimista, sekä sivutuotteena 5) kattava yleiskatsaus CIS-HCR adaptiiviseen on-line käsinkirjoitettujen merkkien tunnistusjärjestelmään.reviewe

    A Comparative Analysis of NLP Algorithms for Implementing AI Conversational Assistants

    Get PDF
    The rapid adoption of low-code/no-code software systems has reshaped the landscape of software development, but it also brings challenges in usability and accessibility, particularly for those unfamiliar with the specific components and templates of these platforms. This thesis targets improving the developer experience in Nokia Corporation's low-code/no-code software system for network management through the incorporation of Natural Language Interfaces (NLIs) using Natural Language Processing (NLP) algorithms. Focused on key NLP tasks like entity extraction and intent classification, we analyzed a variety of algorithms, including MaxEnt Classifier with NLTK, Spacy, Conditional Random Fields with Stanford NER for entity recognition, and SVM Classifier, Logistic Regression, Naïve Bayes, Decision Tree, Random Forest, and RASA DIET for intent classification. Each algorithm's performance was rigorously evaluated using a dataset generated from network-related utterances. The evaluation metrics included not only performance metrics but also system metrics. Our research uncovers significant trade-offs in algorithmic selection, elucidating the balance between computational cost and predictive accuracy. It reveals that while some models, like RASA DIET, excel in accuracy, they require extensive computational resources, making them less suitable for lightweight systems. In contrast, simpler models like Spacy and StanfordNER provide a balanced performance but require careful consideration for specific entity types. While the study is limited by dataset size and focuses on simpler algorithms, it offers an empirically grounded framework for practitioners and decision-makers at Nokia and similar corporations. The findings point towards future research directions, including the exploration of ensemble methods, the fine-tuning of existing models, and the real-world implementation and scalability of these algorithms in low-code/no-code platforms
    corecore