424 research outputs found

    SCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis Using Artificial Neural Networks

    Get PDF
    In this paper, we describe a so-called screening approach for learning robust processing of spontaneously spoken language. A screening approach is a flat analysis which uses shallow sequences of category representations for analyzing an utterance at various syntactic, semantic and dialog levels. Rather than using a deeply structured symbolic analysis, we use a flat connectionist analysis. This screening approach aims at supporting speech and language processing by using (1) data-driven learning and (2) robustness of connectionist networks. In order to test this approach, we have developed the SCREEN system which is based on this new robust, learned and flat analysis. In this paper, we focus on a detailed description of SCREEN's architecture, the flat syntactic and semantic analysis, the interaction with a speech recognizer, and a detailed evaluation analysis of the robustness under the influence of noisy or incomplete input. The main result of this paper is that flat representations allow more robust processing of spontaneous spoken language than deeply structured representations. In particular, we show how the fault-tolerance and learning capability of connectionist networks can support a flat analysis for providing more robust spoken-language processing within an overall hybrid symbolic/connectionist framework.Comment: 51 pages, Postscript. To be published in Journal of Artificial Intelligence Research 6(1), 199

    A framework for improving error detection and correction in spoken dialog systems

    Get PDF
    Despite The Recent Improvements In Performance And Reliably Of The Different Components Of Dialog Systems, It Is Still Crucial To Devise Strategies To Avoid Error Propagation From One Another. In This Paper, We Contribute A Framework For Improved Error Detection And Correction In Spoken Conversational Interfaces. The Framework Combines User Behavior And Error Modeling To Estimate The Probability Of The Presence Of Errors In The User Utterance. This Estimation Is Forwarded To The Dialog Manager And Used To Compute Whether It Is Necessary To Correct Possible Errors. We Have Designed An Strategy Differentiating Between The Main Misunderstanding And Non-Understanding Scenarios, So That The Dialog Manager Can Provide An Acceptable Tailored Response When Entering The Error Correction State. As A Proof Of Concept, We Have Applied Our Proposal To A Customer Support Dialog System. Our Results Show The Appropriateness Of Our Technique To Correctly Detect And React To Errors, Enhancing The System Performance And User Satisfaction.This work was supported in part by Projects MINECO TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS (S2009/TIC-1485)

    Spoken language processing in the hybrid connectionist architecture SCREEN

    Get PDF
    In this paper we describe a robust, learning approach to spoken language understanding. Since interactively spoken and computationally analyzed language often contains many errors, robust connectionist networks are used for providing a flat screening analysis. A screening analysis is a shallow flat analysis based on category sequences at various syntactic, semantic and dialog levels. Rather than using tree or graph representations a screening analysis uses category sequences in order to support robustness and learning. This flat screening analysis is examined in the context of the system SCREEN (Symbolic Connectionist Robust EnterprisE for Natural language). Starting with the word hypotheses generated by a speech recognizer, we give an overview of the architecture, and illustrate the flat robust processing at the levels of syntax, semantics, and dialog acts. While early connectionist models were often limited to a single network and a small task, the hybrid connectionist SCREEN system is an important step towards exploring connectionist techniques in larger hybrid symbolic/connectionist environments and for real-world problemsBased on our experience with SCREEN, hybrid connectionist techniques show a lot of potential for supporting robustness in interactive spoken language processing

    A Robust Architecture For Human Language Technology Systems

    Get PDF
    Early human language technology systems were designed in a monolithic fashion. As these systems became more complex, this design became untenable. In its place, the concept of distributed processing evolved wherein the monolithic structure was decomposed into a number of functional components that could interact through a common protocol. This distributed framework was readily accepted by the research community and has been the cornerstone for the advancement in cutting edge human language technology prototype systems.The Defense Advanced Research Program Agency (DARPA) Communicator program has been highly successful in implementing this approach. The program has fueled the design and development of impressive human language technology applications. Its distributed framework has offered numerous benefits to the research community, including reduced prototype development time, sharing of components across sites, and provision of a standard evaluation platform. It has also enabled development of client-server applications with complex inter-process communication between modules. However, this latter feature, though beneficial, introduces complexities which reduce overall system robustness to failure. In addition, the ability to handle multiple users and multiple applications from a common interface is not innately supported. This thesis describes the enhancements to the original Communicator architecture that address robustness issues and provide a multiple multi-user application environment by enabling automated server startup, error detection and correction. Extensive experimentation and analysis were performed to measure improvements in robustness due to the enhancements to the DARPA architecture. A 7.2% improvement in robustness was achieved on the address querying task, which is the most complex task in the human language technology system

    Audiovisual prosody in interaction

    Get PDF

    Characterizing and recognizing spoken corrections in human-computer dialog

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 103-106).Miscommunication in human-computer spoken language systems is unavoidable. Recognition failures on the part of the system necessitate frequent correction attempts by the user. Unfortunately and counterintuitively, users' attempts to speak more clearly in the face of recognition errors actually lead to decreased recognition accuracy. The difficulty of correcting these errors, in turn, leads to user frustration and poor assessments of system quality. Most current approaches to identifying corrections rely on detecting violations of task or belief models that are ineffective where such constraints are weak and recognition results inaccurate or unavailable. In contrast, the approach pursued in this thesis, in contrast, uses the acoustic contrasts between original inputs and repeat corrections to identify corrections in a more content- and context-independent fashion. This thesis quantifies and builds upon the observation that suprasegmental features, such as duration, pause, and pitch, play a crucial role in distinguishing corrections from other forms of input to spoken language systems. These features can also be used to identify spoken corrections and explain reductions in recognition accuracy for these utterances. By providing a detailed characterization of acoustic-prosodic changes in corrections relative to original inputs in a voice-only system, this thesis contributes to natural language processing and spoken language understanding. We present a treatment of systematic acoustic variability in speech recognizer input as a source of new information, to interpret the speaker's corrective intent, rather than simply as noise or user error. We demonstrate the application of a machine-learning technique, decision trees, for identifying spoken corrections and achieve accuracy rates close to human levels of performance for corrections of misrecognition errors, using acoustic-prosodic information. This process is simple and local and depends neither on perfect transcription of the recognition string nor complex reasoning based on the full conversation. We further extend the conventional analysis of speaking styles beyond a 'read' versus 'conversational' contrast to extreme clear speech, describing divergence from phonological and durational models for words in this style.by Gina-Anne Levow.Ph.D
    corecore