20 research outputs found

    Multimodal interaction with mobile devices : fusing a broad spectrum of modality combinations

    Get PDF
    This dissertation presents a multimodal architecture for use in mobile scenarios such as shopping and navigation. It also analyses a wide range of feasible modality input combinations for these contexts. For this purpose, two interlinked demonstrators were designed for stand-alone use on mobile devices. Of particular importance was the design and implementation of a modality fusion module capable of combining input from a range of communication modes like speech, handwriting, and gesture. The implementation is able to account for confidence value biases arising within and between modalities and also provides a method for resolving semantically overlapped input. Tangible interaction with real-world objects and symmetric multimodality are two further themes addressed in this work. The work concludes with the results from two usability field studies that provide insight on user preference and modality intuition for different modality combinations, as well as user acceptance for anthropomorphized objects.Diese Dissertation präsentiert eine multimodale Architektur zum Gebrauch in mobilen Umständen wie z. B. Einkaufen und Navigation. Außerdem wird ein großes Gebiet von möglichen modalen Eingabekombinationen zu diesen Umständen analysiert. Um das in praktischer Weise zu demonstrieren, wurden zwei teilweise gekoppelte Vorführungsprogramme zum \u27stand-alone\u27; Gebrauch auf mobilen Geräten entworfen. Von spezieller Wichtigkeit war der Entwurf und die Ausführung eines Modalitäts-fusion Modul, das die Kombination einer Reihe von Kommunikationsarten wie Sprache, Handschrift und Gesten ermöglicht. Die Ausführung erlaubt die Veränderung von Zuverlässigkeitswerten innerhalb einzelner Modalitäten und außerdem ermöglicht eine Methode um die semantisch überlappten Eingaben auszuwerten. Wirklichkeitsnaher Dialog mit aktuellen Objekten und symmetrische Multimodalität sind zwei weitere Themen die in dieser Arbeit behandelt werden. Die Arbeit schließt mit Resultaten von zwei Feldstudien, die weitere Einsicht erlauben über die bevorzugte Art verschiedener Modalitätskombinationen, sowie auch über die Akzeptanz von anthropomorphisierten Objekten

    Multimodal interaction with mobile devices : fusing a broad spectrum of modality combinations

    Get PDF
    This dissertation presents a multimodal architecture for use in mobile scenarios such as shopping and navigation. It also analyses a wide range of feasible modality input combinations for these contexts. For this purpose, two interlinked demonstrators were designed for stand-alone use on mobile devices. Of particular importance was the design and implementation of a modality fusion module capable of combining input from a range of communication modes like speech, handwriting, and gesture. The implementation is able to account for confidence value biases arising within and between modalities and also provides a method for resolving semantically overlapped input. Tangible interaction with real-world objects and symmetric multimodality are two further themes addressed in this work. The work concludes with the results from two usability field studies that provide insight on user preference and modality intuition for different modality combinations, as well as user acceptance for anthropomorphized objects.Diese Dissertation präsentiert eine multimodale Architektur zum Gebrauch in mobilen Umständen wie z. B. Einkaufen und Navigation. Außerdem wird ein großes Gebiet von möglichen modalen Eingabekombinationen zu diesen Umständen analysiert. Um das in praktischer Weise zu demonstrieren, wurden zwei teilweise gekoppelte Vorführungsprogramme zum 'stand-alone'; Gebrauch auf mobilen Geräten entworfen. Von spezieller Wichtigkeit war der Entwurf und die Ausführung eines Modalitäts-fusion Modul, das die Kombination einer Reihe von Kommunikationsarten wie Sprache, Handschrift und Gesten ermöglicht. Die Ausführung erlaubt die Veränderung von Zuverlässigkeitswerten innerhalb einzelner Modalitäten und außerdem ermöglicht eine Methode um die semantisch überlappten Eingaben auszuwerten. Wirklichkeitsnaher Dialog mit aktuellen Objekten und symmetrische Multimodalität sind zwei weitere Themen die in dieser Arbeit behandelt werden. Die Arbeit schließt mit Resultaten von zwei Feldstudien, die weitere Einsicht erlauben über die bevorzugte Art verschiedener Modalitätskombinationen, sowie auch über die Akzeptanz von anthropomorphisierten Objekten

    Using contour information and segmentation for object registration, modeling and retrieval

    Get PDF
    This thesis considers different aspects of the utilization of contour information and syntactic and semantic image segmentation for object registration, modeling and retrieval in the context of content-based indexing and retrieval in large collections of images. Target applications include retrieval in collections of closed silhouettes, holistic w ord recognition in handwritten historical manuscripts and shape registration. Also, the thesis explores the feasibility of contour-based syntactic features for improving the correspondence of the output of bottom-up segmentation to semantic objects present in the scene and discusses the feasibility of different strategies for image analysis utilizing contour information, e.g. segmentation driven by visual features versus segmentation driven by shape models or semi-automatic in selected application scenarios. There are three contributions in this thesis. The first contribution considers structure analysis based on the shape and spatial configuration of image regions (socalled syntactic visual features) and their utilization for automatic image segmentation. The second contribution is the study of novel shape features, matching algorithms and similarity measures. Various applications of the proposed solutions are presented throughout the thesis providing the basis for the third contribution which is a discussion of the feasibility of different recognition strategies utilizing contour information. In each case, the performance and generality of the proposed approach has been analyzed based on extensive rigorous experimentation using as large as possible test collections

    Adaptive combinations of classifiers with application to on-line handwritten character recognition

    Get PDF
    Classifier combining is an effective way of improving classification performance. User adaptation is clearly another valid approach for improving performance in a user-dependent system, and even though adaptation is usually performed on the classifier level, also adaptive committees can be very effective. Adaptive committees have the distinct ability of performing adaptation without detailed knowledge of the classifiers. Adaptation can therefore be used even with classification systems that intrinsically are not suited for adaptation, whether that be due to lack of access to the workings of the classifier or simply a classification scheme not suitable for continuous learning. This thesis proposes methods for adaptive combination of classifiers in the setting of on-line handwritten character recognition. The focal part of the work introduces adaptive classifier combination schemes, of which the two most prominent ones are the Dynamically Expanding Context (DEC) committee and the Class-Confidence Critic Combining (CCCC) committee. Both have been shown to be capable of successful adaptation to the user in the task of on-line handwritten character recognition. Particularly the highly modular CCCC framework has shown impressive performance also in a doubly-adaptive setting of combining adaptive classifiers by using an adaptive committee. In support of this main topic of the thesis, some discussion on a methodology for deducing correct character labeling from user actions is presented. Proper labeling is paramount for effective adaptation, and deducing the labels from the user's actions is necessary to perform adaptation transparently to the user. In that way, the user does not need to give explicit feedback on the correctness of the recognition results. Also, an overview is presented of adaptive classification methods for single-classifier adaptation in handwritten character recognition developed at the Laboratory of Computer and Information Science of the Helsinki University of Technology, CIS-HCR. Classifiers based on the CIS-HCR system have been used in the adaptive committee experiments as both member classifiers and to provide a reference level. Finally, two distinct approaches for improving the performance of committee classifiers further are discussed. Firstly, methods for committee rejection are presented and evaluated. Secondly, measures of classifier diversity for classifier selection, based on the concept of diversity of errors, are presented and evaluated. The topic of this thesis hence covers three important aspects of pattern recognition: on-line adaptation, combining classifiers, and a practical evaluation setting of handwritten character recognition. A novel approach combining these three core ideas has been developed and is presented in the introductory text and the included publications. To reiterate, the main contributions of this thesis are: 1) introduction of novel adaptive committee classification methods, 2) introduction of novel methods for measuring classifier diversity, 3) presentation of some methods for implementing committee rejection, 4) discussion and introduction of a method for effective label deduction from on-line user actions, and as a side-product, 5) an overview of the CIS-HCR adaptive on-line handwritten character recognition system.Luokittimien yhdistäminen komitealuokittimella on tehokas keino luokitustarkkuuden parantamiseen. Laskentatehon jatkuva kasvu tekee myös useiden luokittimien yhtäaikaisesta käytöstä yhä varteenotettavamman vaihtoehdon. Järjestelmän adaptoituminen (mukautuminen) käyttäjään on toinen hyvä keino käyttäjäriippumattoman järjestelmän tarkkuuden parantantamiseksi. Vaikka adaptaatio yleensä toteutetaan luokittimen tasolla, myös adaptiiviset komitealuokittimet voivat olla hyvin tehokkaita. Adaptiiviset komiteat voivat adaptoitua ilman yksityiskohtaista tietoa jäsenluokittimista. Adaptaatiota voidaan näin käyttää myös luokittelujärjestelmissä, jotka eivät ole itsessään sopivia adaptaatioon. Adaptaatioon sopimattomuus voi johtua esimerkiksi siitä, että luokittimen totetutusta ei voida muuttaa, tai siitä, että käytetään luokittelumenetelmää, joka ei sovellu jatkuvaan oppimiseen. Tämä väitöskirja käsittelee menetelmiä luokittimien adaptiiviseen yhdistämiseen käyttäen sovelluskohteena käsinkirjoitettujen merkkien on-line-tunnistusta. Keskeisin osa työtä esittelee uusia adaptiivisia luokittimien yhdistämismenetelmiä, joista kaksi huomattavinta ovat Dynamically Expanding Context (DEC) -komitea sekä Class-Confidence Critic Combining (CCCC) -komitea. Molemmat näistä ovat osoittautuneet kykeneviksi tehokkaaseen käyttäjä-adaptaatioon käsinkirjoitettujen merkkien on-line-tunnistuksessa. Erityisesti hyvin modulaarisella CCCC järjestelmällä on saatu hyviä tuloksia myös kaksinkertaisesti adaptiivisessa asetelmassa, jossa yhdistetään adaptiivisia jäsenluokittimia adaptiivisen komitean avulla. Väitöskirjan pääteeman tukena esitetään myös malli ja käytännön esimerkki siitä, miten käyttäjän toimista merkeille voidaan päätellä oikeat luokat. Merkkien todellisen luokan onnistunut päättely on elintärkeää tehokkaalle adaptaatiolle. Jotta adaptaatio voitaisiin suorittaa käyttäjälle läpinäkyvästi, merkkien todelliset luokat on kyettävä päättelemään käyttäjän toimista. Tällä tavalla käyttäjän ei tarvitse antaa suoraa palautetta tunnistustuloksen oikeellisuudesta. Työssä esitetään myös yleiskatsaus Teknillisen korkeakoulun Informaatiotekniikan laboratoriossa kehitettyyn adaptiiviseen käsinkirjoitettujen merkkien tunnistusjärjestelmään. Tähän järjestelmään perustuvia luokittimia on käytetty adaptiivisten komitealuokittimien kokeissa sekä jäsenluokittimina että vertailutasona. Lopuksi esitellään kaksi erillistä menetelmää komitealuokittimen tarkkuuden edelleen parantamiseksi. Näistä ensimmäinen on joukko menetelmiä komitealuokittimen rejektion (hylkäyksen) toteuttamiseksi. Toinen esiteltävä menetelmä on käyttää luokittimien erilaisuuden mittoja jäsenluokittimien valintaa varten. Ehdotetut uudet erilaisuusmitat perustuvat käsitteeseen, jota kutsumme virheiden erilaisuudeksi. Väitöskirjan aihe kattaa kolme hahmontunnistuksen tärkeää osa-aluetta: online-adaptaation, luokittimien yhdistämisen ja käytännön sovellusalana käsinkirjoitettujen merkkien tunnistuksen. Näistä kolmesta lähtökohdasta on kehitetty uudenlainen synteesi, joka esitetään johdantotekstissä sekä liitteenä olevissa julkaisuissa. Tämän väitöskirjan oleellisimmat kontribuutiot ovat siten: 1) uusien adaptiivisten komitealuokittimien esittely, 2) uudenlaisten menetelmien esittely luokittimien erilaisuuden mittaamiseksi, 3) joidenkin komitearejektiomenetelmien esittely, 4) pohdinnan ja erään toteutustavan esittely syötettyjen merkkien todellisen luokan päättelemiseksi käyttäjän toimista, sekä sivutuotteena 5) kattava yleiskatsaus CIS-HCR adaptiiviseen on-line käsinkirjoitettujen merkkien tunnistusjärjestelmään.reviewe

    The use of belief networks in natural language understanding and dialog modeling.

    Get PDF
    Wai, Chi Man Carmen.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 129-136).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview --- p.1Chapter 1.2 --- Natural Language Understanding --- p.3Chapter 1.3 --- BNs for Handling Speech Recognition Errors --- p.4Chapter 1.4 --- BNs for Dialog Modeling --- p.5Chapter 1.5 --- Thesis Goals --- p.8Chapter 1.6 --- Thesis Outline --- p.8Chapter 2 --- Background --- p.10Chapter 2.1 --- Natural Language Understanding --- p.11Chapter 2.1.1 --- Rule-based Approaches --- p.12Chapter 2.1.2 --- Stochastic Approaches --- p.13Chapter 2.1.3 --- Phrase-Spotting Approaches --- p.16Chapter 2.2 --- Handling Recognition Errors in Spoken Queries --- p.17Chapter 2.3 --- Spoken Dialog Systems --- p.19Chapter 2.3.1 --- Finite-State Networks --- p.21Chapter 2.3.2 --- The Form-based Approaches --- p.21Chapter 2.3.3 --- Sequential Decision Approaches --- p.22Chapter 2.3.4 --- Machine Learning Approaches --- p.24Chapter 2.4 --- Belief Networks --- p.27Chapter 2.4.1 --- Introduction --- p.27Chapter 2.4.2 --- Bayesian Inference --- p.29Chapter 2.4.3 --- Applications of the Belief Networks --- p.32Chapter 2.5 --- Chapter Summary --- p.33Chapter 3 --- Belief Networks for Natural Language Understanding --- p.34Chapter 3.1 --- The ATIS Domain --- p.35Chapter 3.2 --- Problem Formulation --- p.36Chapter 3.3 --- Semantic Tagging --- p.37Chapter 3.4 --- Belief Networks Development --- p.38Chapter 3.4.1 --- Concept Selection --- p.39Chapter 3.4.2 --- Bayesian Inferencing --- p.40Chapter 3.4.3 --- Thresholding --- p.40Chapter 3.4.4 --- Goal Identification --- p.41Chapter 3.5 --- Experiments on Natural Language Understanding --- p.42Chapter 3.5.1 --- Comparison between Mutual Information and Informa- tion Gain --- p.42Chapter 3.5.2 --- Varying the Input Dimensionality --- p.44Chapter 3.5.3 --- Multiple Goals and Rejection --- p.46Chapter 3.5.4 --- Comparing Grammars --- p.47Chapter 3.6 --- Benchmark with Decision Trees --- p.48Chapter 3.7 --- Performance on Natural Language Understanding --- p.51Chapter 3.8 --- Handling Speech Recognition Errors in Spoken Queries --- p.52Chapter 3.8.1 --- Corpus Preparation --- p.53Chapter 3.8.2 --- Enhanced Belief Network Topology --- p.54Chapter 3.8.3 --- BNs for Handling Speech Recognition Errors --- p.55Chapter 3.8.4 --- Experiments on Handling Speech Recognition Errors --- p.60Chapter 3.8.5 --- Significance Testing --- p.64Chapter 3.8.6 --- Error Analysis --- p.65Chapter 3.9 --- Chapter Summary --- p.67Chapter 4 --- Belief Networks for Mixed-Initiative Dialog Modeling --- p.68Chapter 4.1 --- The CU FOREX Domain --- p.69Chapter 4.1.1 --- Domain-Specific Constraints --- p.69Chapter 4.1.2 --- Two Interaction Modalities --- p.70Chapter 4.2 --- The Belief Networks --- p.70Chapter 4.2.1 --- Informational Goal Inference --- p.72Chapter 4.2.2 --- Detection of Missing / Spurious Concepts --- p.74Chapter 4.3 --- Integrating Two Interaction Modalities --- p.78Chapter 4.4 --- Incorporating Out-of-Vocabulary Words --- p.80Chapter 4.4.1 --- Natural Language Queries --- p.80Chapter 4.4.2 --- Directed Queries --- p.82Chapter 4.5 --- Evaluation of the BN-based Dialog Model --- p.84Chapter 4.6 --- Chapter Summary --- p.87Chapter 5 --- Scalability and Portability of Belief Network-based Dialog Model --- p.88Chapter 5.1 --- Migration to the ATIS Domain --- p.89Chapter 5.2 --- Scalability of the BN-based Dialog Model --- p.90Chapter 5.2.1 --- Informational Goal Inference --- p.90Chapter 5.2.2 --- Detection of Missing / Spurious Concepts --- p.92Chapter 5.2.3 --- Context Inheritance --- p.94Chapter 5.3 --- Portability of the BN-based Dialog Model --- p.101Chapter 5.3.1 --- General Principles for Probability Assignment --- p.101Chapter 5.3.2 --- Performance of the BN-based Dialog Model with Hand- Assigned Probabilities --- p.105Chapter 5.3.3 --- Error Analysis --- p.108Chapter 5.4 --- Enhancements for Discourse Query Understanding --- p.110Chapter 5.4.1 --- Combining Trained and Handcrafted Probabilities --- p.110Chapter 5.4.2 --- Handcrafted Topology for BNs --- p.111Chapter 5.4.3 --- Performance of the Enhanced BN-based Dialog Model --- p.117Chapter 5.5 --- Chapter Summary --- p.120Chapter 6 --- Conclusions --- p.122Chapter 6.1 --- Summary --- p.122Chapter 6.2 --- Contributions --- p.126Chapter 6.3 --- Future Work --- p.127Bibliography --- p.129Chapter A --- The Two Original SQL Query --- p.137Chapter B --- "The Two Grammars, GH and GsA" --- p.139Chapter C --- Probability Propagation in Belief Networks --- p.149Chapter C.1 --- Computing the aposteriori probability of P*(G) based on in- put concepts --- p.151Chapter C.2 --- Computing the aposteriori probability of P*(Cj) by backward inference --- p.154Chapter D --- Total 23 Concepts for the Handcrafted BN --- p.15

    Efficient and Robust Methods for Audio and Video Signal Analysis

    Get PDF
    This thesis presents my research concerning audio and video signal processing and machine learning. Specifically, the topics of my research include computationally efficient classifier compounds, automatic speech recognition (ASR), music dereverberation, video cut point detection and video classification.Computational efficacy of information retrieval based on multiple measurement modalities has been considered in this thesis. Specifically, a cascade processing framework, including a training algorithm to set its parameters has been developed for combining multiple detectors or binary classifiers in computationally efficient way. The developed cascade processing framework has been applied on video information retrieval tasks of video cut point detection and video classification. The results in video classification, compared to others found in the literature, indicate that the developed framework is capable of both accurate and computationally efficient classification. The idea of cascade processing has been additionally adapted for the ASR task. A procedure for combining multiple speech state likelihood estimation methods within an ASR framework in cascaded manner has been developed. The results obtained clearly show that without impairing the transcription accuracy the computational load of ASR can be reduced using the cascaded speech state likelihood estimation process.Additionally, this thesis presents my work on noise robustness of ASR using a nonnegative matrix factorization (NMF) -based approach. Specifically, methods for transformation of sparse NMF-features into speech state likelihoods has been explored. The results reveal that learned transformations from NMF activations to speech state likelihoods provide better ASR transcription accuracy than dictionary label -based transformations. The results, compared to others in a noisy speech recognition -challenge show that NMF-based processing is an efficient strategy for noise robustness in ASR.The thesis also presents my work on audio signal enhancement, specifically, on removing the detrimental effect of reverberation from music audio. In the work, a linear prediction -based dereverberation algorithm, which has originally been developed for speech signal enhancement, was applied for music. The results obtained show that the algorithm performs well in conjunction with music signals and indicate that dynamic compression of music does not impair the dereverberation performance

    Decision-Theoretic Planning for User-Adaptive Systems: Dealing With Multiple Goals and Resource Limitations

    Get PDF
    While there exists a number of user-adaptive systems that use decision-theoretic methods to make individual decisions, decision-theoretic planning has hardly been exploited in the context of useradaptive systems so far. This thesis focuses on the application of decision-theoretic planning in user-adaptive systems and demonstrates how competing goals and resource limitations of the user can be considered in such an approach. The approach is illustrated with examples from the following domains: user-adaptive assistance for operating a technical device, user-adaptive navigation recommendations in an airport scenario, and finally user-adaptive and location-aware shopping assistance. With the shopping assistant, we have analyzed usability issues of a system based on decision-theoretic planning in two user studies. We describe how hard time constraints, as they are induced, for example, by the boarding of the passenger in an airport navigation scenario, can be considered in a decision-theoretic approach. Moreover, we propose a hierarchical decision-theoretic planning approach based on goal priorization, which keeps the complexity of dealing with realistic problems tractable. Furthermore, we specify the general workflow for the development and application of Markov decision processes to be applied in user-adaptive systems, and we describe possibilities to enhance a user-adaptive system based on decision-theoretic planning by an explanation component
    corecore