9 research outputs found

    The use of multiple speech recognition hypotheses for natural language understanding.

    Get PDF
    Wang Ying.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 102-104).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview --- p.1Chapter 1.2 --- Thesis Goals --- p.3Chapter 1.3 --- Thesis Outline --- p.3Chapter 2 --- Background --- p.4Chapter 2.1 --- Speech Recognition --- p.4Chapter 2.2 --- Natural Language Understanding --- p.6Chapter 2.2.1 --- Rule-based Approach --- p.7Chapter 2.2.2 --- Corpus-based Approach --- p.7Chapter 2.3 --- Integration of Speech Recognition with NLU --- p.8Chapter 2.3.1 --- Word Graph --- p.9Chapter 2.3.2 --- N-best List --- p.9Chapter 2.4 --- The ATIS Domain --- p.10Chapter 2.5 --- Chapter Summary --- p.14Chapter 3 --- Generation of Speech Recognition Hypotheses --- p.15Chapter 3.1 --- Grammar Development for the OpenSpeech Recognizer --- p.16Chapter 3.2 --- Generation of Speech Recognition Hypotheses --- p.22Chapter 3.3 --- Evaluation of Speech Recognition Hypotheses --- p.24Chapter 3.3.1 --- Recognition Accuracy --- p.24Chapter 3.3.2 --- Concept Accuracy --- p.28Chapter 3.4 --- Results and Analysis --- p.33Chapter 3.5 --- Chapter Summary --- p.38Chapter 4 --- Belief Networks for NLU --- p.40Chapter 4.1 --- Problem Formulation --- p.40Chapter 4.2 --- The Original NLU Framework --- p.41Chapter 4.2.1 --- Semantic Tagging --- p.41Chapter 4.2.2 --- Concept Selection --- p.42Chapter 4.2.3 --- Bayesian Inference --- p.43Chapter 4.2.4 --- Thresholding --- p.44Chapter 4.2.5 --- Goal Identification --- p.45Chapter 4.3 --- Evaluation Method of Goal Identification Performance --- p.45Chapter 4.4 --- Baseline Result --- p.48Chapter 4.5 --- Chapter Summary --- p.50Chapter 5 --- The Effects of Recognition Errors on NLU --- p.51Chapter 5.1 --- Experiments --- p.51Chapter 5.1.1 --- Perfect Case´ؤThe Use of Transcripts --- p.53Chapter 5.1.2 --- Train on Recognition Hypotheses --- p.53Chapter 5.1.3 --- Test on Recognition Hypotheses --- p.55Chapter 5.1.4 --- Train and Test on Recognition Hypotheses --- p.56Chapter 5.2 --- Analysis of Results --- p.60Chapter 5.3 --- Chapter Summary --- p.67Chapter 6 --- The Use of Multiple Speech Recognition Hypotheses for NLU --- p.69Chapter 6.1 --- The Extended NLU Framework --- p.76Chapter 6.1.1 --- Semantic Tagging --- p.76Chapter 6.1.2 --- Recognition Confidence Score Normalization --- p.77Chapter 6.1.3 --- Concept Selection --- p.79Chapter 6.1.4 --- Bayesian Inference --- p.80Chapter 6.1.5 --- Combination with Confidence Scores --- p.81Chapter 6.1.6 --- Thresholding --- p.84Chapter 6.1.7 --- Goal Identification --- p.84Chapter 6.2 --- Experiments --- p.86Chapter 6.2.1 --- The Use of First Best Recognition Hypothesis --- p.86Chapter 6.2.2 --- Train on Multiple Recognition Hypotheses --- p.86Chapter 6.2.3 --- Test on Multiple Recognition Hypotheses --- p.87Chapter 6.2.4 --- Train and Test on Multiple Recognition Hypotheses --- p.88Chapter 6.3 --- Significance Testing --- p.90Chapter 6.4 --- Result Analysis --- p.91Chapter 6.5 --- Chapter Summary --- p.97Chapter 7 --- Conclusions and Future Work --- p.98Chapter 7.1 --- Conclusions --- p.98Chapter 7.2 --- Contribution --- p.99Chapter 7.3 --- Future Work --- p.100Bibliography --- p.102Chapter A --- Speech Recognition Hypotheses Distribution --- p.105Chapter B --- Recognition Errors in Three Kinds of Queries --- p.107Chapter C --- The Effects of Recognition Errors in N-Best list on NLU --- p.114Chapter D --- Training on Multiple Recognition Hypotheses --- p.117Chapter E --- Testing on Multiple Recognition Hypotheses --- p.132Chapter F --- Hand-designed Grammar For ATIS --- p.13

    Understanding user state and preferences for robust spoken dialog systems and location-aware assistive technology

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science; and, (S.M. in Technology and Policy)--Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 119-125).This research focuses on improving the performance of spoken dialog systems (SDS) in the domain of assistive technology for people with disabilities. Automatic speech recognition (ASR) has compelling potential applications as a means of enabling people with physical disabilities to enjoy greater levels of independence and participation. This thesis describes the development and evaluation of a spoken dialog system modeled as a partially observable Markov decision process (SDS-POMDP). The SDSPOMDP can understand commands related to making phone calls and providing information about weather, activities, and menus in a specialized-care residence setting. Labeled utterance data was used to train observation and utterance confidence models. With a user simulator, the SDS-POMDP reward function parameters were optimized, and the SDS-POMDP is shown to out-perform simpler threshold-based dialog strategies. These simulations were validated in experiments with human participants, with the SDS-POMDP resulting in more successful dialogs and faster dialog completion times, particularly for speakers with high word-error rates. This thesis also explores the social and ethical implications of deploying location based assistive technology in specialized-care settings. These technologies could have substantial potential benefit to residents and caregivers in such environments, but they may also raise issues related to user safety, independence, autonomy, or privacy. As one example, location-aware mobile devices are potentially useful to increase the safety of individuals in a specialized-care setting who may be at risk of unknowingly wandering, but they raise important questions about privacy and informed consent. This thesis provides a survey of U.S. legislation related to the participation of individuals who have questionable capacity to provide informed consent in research studies. Overall, it seeks to precisely describe and define the key issues that are arise as a result of new, unforeseen technologies that may have both benefits and costs to the elderly and people with disabilities.by William Li.S.M.in Technology and PolicyS.M

    Recognition confidence scoring and its use in speech understanding systems

    No full text
    In this paper we present an approach to recognition confidence scoring and a set of techniques for integrating confidence scores into the understanding and dialogue components of a speech understanding system. The recognition component uses a multi-tiered approach where confidence scores are computed at the phonetic, word, and utterance levels. The scores are produced by extracting confidence features from the computation of the recognition hypotheses and processing these features using an accept/reject classifier for word and utterance hypotheses. The scores generated by the confidence classifier can then be passed on to the language understanding and dialogue modeling components of the system. In these components the confidence scores can be combined with linguistic scores and pragmatic constraints before the system makes a final decision about the appropriate action to be taken. To evaluate the system, experiments were conducted using the JUPITER weather information system. An evaluation of the confidence classifier at the word-level shows that the system detects 66 % of the recognizer’s errors with a false detection rate on correctly recognized words of only 5%. An evaluation was also performed at the understanding level using key-value pair concept error rate as the evaluation metric. When confidence scores were integrated into the understanding component of the system, a relative reduction of 35 % in concept error rate was achieved. c 2002 Academic Press 1

    Natural language understanding across application domains and languages.

    Get PDF
    Tsui Wai-Ching.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references (leaves 115-122).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview --- p.1Chapter 1.2 --- Natural Language Understanding Using Belief Networks --- p.5Chapter 1.3 --- Integrating Speech Recognition with Natural Language Un- derstanding --- p.7Chapter 1.4 --- Thesis Goals --- p.9Chapter 1.5 --- Thesis Organization --- p.10Chapter 2 --- Background --- p.12Chapter 2.1 --- Natural Language Understanding Approaches --- p.13Chapter 2.1.1 --- Rule-based Approaches --- p.15Chapter 2.1.2 --- Stochastic Approaches --- p.16Chapter 2.1.3 --- Mixed Approaches --- p.18Chapter 2.2 --- Portability of Natural Language Understanding Frameworks --- p.19Chapter 2.2.1 --- Portability across Domains --- p.19Chapter 2.2.2 --- Portability across Languages --- p.20Chapter 2.2.3 --- Portability across both Domains and Languages --- p.21Chapter 2.3 --- Spoken Language Understanding --- p.21Chapter 2.3.1 --- Integration of Speech Recognition Confidence into Nat- ural Language Understanding --- p.22Chapter 2.3.2 --- Integration of Other Potential Confidence Features into Natural Language Understanding --- p.24Chapter 2.4 --- Belief Networks --- p.24Chapter 2.4.1 --- Overview --- p.24Chapter 2.4.2 --- Bayesian Inference --- p.26Chapter 2.5 --- Transformation-based Parsing Technique --- p.27Chapter 2.6 --- Chapter Summary --- p.28Chapter 3 --- Portability of the Natural Language Understanding Frame- work across Application Domains and Languages --- p.31Chapter 3.1 --- Natural Language Understanding Framework --- p.32Chapter 3.1.1 --- Semantic Tagging --- p.33Chapter 3.1.2 --- Informational Goal Inference with Belief Networks --- p.34Chapter 3.2 --- The ISIS Stocks Domain --- p.36Chapter 3.3 --- A Unified Framework for English and Chinese --- p.38Chapter 3.3.1 --- Semantic Tagging for the ISIS domain --- p.39Chapter 3.3.2 --- Transformation-based Parsing --- p.40Chapter 3.3.3 --- Informational Goal Inference with Belief Networks for the ISIS domain --- p.43Chapter 3.4 --- Experiments --- p.45Chapter 3.4.1 --- Goal Identification Experiments --- p.45Chapter 3.4.2 --- A Cross-language Experiment --- p.49Chapter 3.5 --- Chapter Summary --- p.55Chapter 4 --- Enhancement in the Belief Networks for Informational Goal Inference --- p.57Chapter 4.1 --- Semantic Concept Selection in Belief Networks --- p.58Chapter 4.1.1 --- Selection of Positive Evidence --- p.58Chapter 4.1.2 --- Selection of Negative Evidence --- p.62Chapter 4.2 --- Estimation of Statistical Probabilities in the Enhanced Belief Networks --- p.64Chapter 4.2.1 --- Estimation of Prior Probabilities --- p.65Chapter 4.2.2 --- Estimation of Posterior Probabilities --- p.66Chapter 4.3 --- Experiments --- p.73Chapter 4.3.1 --- Belief Networks Developed with Positive Evidence --- p.74Chapter 4.3.2 --- Belief Networks with the Injection of Negative Evidence --- p.76Chapter 4.4 --- Chapter Summary --- p.82Chapter 5 --- Integration between Speech Recognition and Natural Lan- guage Understanding --- p.84Chapter 5.1 --- The Speech Corpus for the Chinese ISIS Stocks Domain --- p.86Chapter 5.2 --- Our Extended Natural Language Understanding Framework for Spoken Language Understanding --- p.90Chapter 5.2.1 --- Integrated Scoring for Chinese Speech Recognition and Natural Language Understanding --- p.92Chapter 5.3 --- Experiments --- p.92Chapter 5.3.1 --- Training and Testing on the Perfect Reference Data Sets --- p.93Chapter 5.3.2 --- Mismatched Training and Testing Conditions ´ؤ Perfect Reference versus Imperfect Hypotheses --- p.93Chapter 5.3.3 --- Comparing Goal Identification between the Use of Single- best versus N-best Recognition Hypotheses --- p.95Chapter 5.3.4 --- Integration of Speech Recognition Confidence Scores into Natural Language Understanding --- p.97Chapter 5.3.5 --- Feasibility of Our Approach for Spoken Language Un- derstanding --- p.99Chapter 5.3.6 --- Justification of Using Max-of-max Classifier in Our Single Goal Identification Scheme --- p.107Chapter 5.4 --- Chapter Summary --- p.109Chapter 6 --- Conclusions and Future Work --- p.110Chapter 6.1 --- Conclusions --- p.110Chapter 6.2 --- Contributions --- p.112Chapter 6.3 --- Future Work --- p.113Bibliography --- p.115Chapter A --- Semantic Frames for Chinese --- p.123Chapter B --- Semantic Frames for English --- p.127Chapter C --- The Concept Set of Positive Evidence for the Nine Goalsin English --- p.131Chapter D --- The Concept Set of Positive Evidence for the Ten Goalsin Chinese --- p.133Chapter E --- The Complete Concept Set including Both the Positive and Negative Evidence for the Ten Goals in English --- p.135Chapter F --- The Complete Concept Set including Both the Positive and Negative Evidence for the Ten Goals in Chinese --- p.138Chapter G --- The Assignment of Statistical Probabilities for Each Selected Concept under the Corresponding Goals in Chinese --- p.141Chapter H --- The Assignment of Statistical Probabilities for Each Selected Concept under the Corresponding Goals in English --- p.14

    Crowd-supervised training of spoken language systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 155-166).Spoken language systems are often deployed with static speech recognizers. Only rarely are parameters in the underlying language, lexical, or acoustic models updated on-the-fly. In the few instances where parameters are learned in an online fashion, developers traditionally resort to unsupervised training techniques, which are known to be inferior to their supervised counterparts. These realities make the development of spoken language interfaces a difficult and somewhat ad-hoc engineering task, since models for each new domain must be built from scratch or adapted from a previous domain. This thesis explores an alternative approach that makes use of human computation to provide crowd-supervised training for spoken language systems. We explore human-in-the-loop algorithms that leverage the collective intelligence of crowds of non-expert individuals to provide valuable training data at a very low cost for actively deployed spoken language systems. We also show that in some domains the crowd can be incentivized to provide training data for free, as a byproduct of interacting with the system itself. Through the automation of crowdsourcing tasks, we construct and demonstrate organic spoken language systems that grow and improve without the aid of an expert. Techniques that rely on collecting data remotely from non-expert users, however, are subject to the problem of noise. This noise can sometimes be heard in audio collected from poor microphones or muddled acoustic environments. Alternatively, noise can take the form of corrupt data from a worker trying to game the system - for example, a paid worker tasked with transcribing audio may leave transcripts blank in hopes of receiving a speedy payment. We develop strategies to mitigate the effects of noise in crowd-collected data and analyze their efficacy. This research spans a number of different application domains of widely-deployed spoken language interfaces, but maintains the common thread of improving the speech recognizer's underlying models with crowd-supervised training algorithms. We experiment with three central components of a speech recognizer: the language model, the lexicon, and the acoustic model. For each component, we demonstrate the utility of a crowd-supervised training framework. For the language model and lexicon, we explicitly show that this framework can be used hands-free, in two organic spoken language systems.by Ian C. McGraw.Ph.D

    Sistemas de diálogo basados en modelos estocásticos

    Full text link
    En la presente tesis, titulada Sistemas de diálogo basados en modelos estocásticos , se expone el estado del arte en el área de los sistemas de diálogo y se presenta el trabajo realizado en el diseño e implementación de los módulos de un sistema de diálogo determinado. La tesis se centra en el estudio de la gestión de diálogo desde una aproximación estadística. La tesis aporta el desarrollo de un sistema de diálogo completo (con entrada y salida de texto, en lengua española, y para una tarea de dominio semántico restringido, la definida en el proyecto de investigación BASURDE). Dicho sistema está constituido por los módulos de comprensión del lenguaje natural, de gestión del diálogo y de generación de respuestas en lenguaje natural. Dado el objetivo central de la tesis, el desarrollo del módulo gestor de diálogo ha sido el principal trabajo y, en consecuencia, es expuesto con la máxima amplitud en la presente memoria. El limitado tamaño del corpus de diálogos de la tarea BASURDE ha supuesto una severa dificultad en el desarrollo de un gestor de diálogo basado exclusivamente en modelos estadísticos. El módulo gestor de diálogo finalmente implementado determina su estrategia de diálogo mediante la combinación de varias fuentes de conocimiento: unas de carácter estocástico, los modelos aprendidos a partir del corpus; otras de arácter heurístico, reglas que incorporan conocimiento pragmático y semántico, ya sea genérico o específico de la tarea. Por último, se ha considerado la simulación de los usuarios como una técnica lternativa para fines como la evaluación del comportamiento del sistema de diálogo, la ampliación del corpus mediante diálogos sintéticos, o el aprendizaje dinámico de los modelos estocásticos de diálogo. Se han diseñado e implementado los correspondientes módulos simuladores de usuario, estudiándose las posibilidades de esta técnica.objetivo central de la tesis, el desarrollo del módulo gestor de diálogo ha sido el principal trabajo y, en onsecuenciaTorres Goterris, F. (2006). Sistemas de diálogo basados en modelos estocásticos [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1901Palanci
    corecore