8,731 research outputs found

    Acquiring and Maintaining Knowledge by Natural Multimodal Dialog

    Get PDF

    Bringing context-aware access to the web through spoken interaction

    Get PDF
    The web has become the largest repository of multimedia information and its convergence with telecommunications is now bringing the benefits of web technology to hand-held devices. To optimize data access using these devices and provide services which meet the user needs through intelligent information retrieval, the system must sense and interpret the user environment and the communication context. In addition, natural spoken conversation with handheld devices makes possible the use of these applications in environments in which the use of GUI interfaces is not effective, provides a more natural human-computer interaction, and facilitates access to the web for people with visual or motor disabilities, allowing their integration and the elimination of barriers to Internet access. In this paper, we present an architecture for the design of context-aware systems that use speech to access web services. Our contribution focuses specifically on the use of context information to improve the effectiveness of providing web services by using a spoken dialog system for the user-system interaction. We also describe an application of our proposal to develop a context-aware railway information system, and provide a detailed evaluation of the influence of the context information in the quality of the services that are supplied.Research funded by projects CICYT TIN2011-28620-C02-01, CICYT TEC 2011-28626-C02-02, CAM CONTEXTS (S2009/TIC-1485), and DPS2008-07029-C02-02.Publicad

    EMOTION PROCESSING IN ALZHEIMER'S DISEASE: THE CLINICAL IMPLICATIONS

    Get PDF
    The purpose of this study is to extend the literature on recognition and identification of non-verbal communicative signals of emotion in those suffering from Alzheimer's disease. To date, there have been few studies in this area, yet emotion processing deficits may have an important effect on the quality of life of Alzheimer's patients and their families. The experimental condition consisted of a set of tasks involving face and prosody discrimination problems in which participants were asked to choose between a number of stimuli presented on cards (facial cues) or on audio-tape (prosody cues). In addition, a measure of general cognitive ability was taken. Firstly, it was found that, relative to a group of healthy older adults, performance on cognitive tasks was depressed, while performance on emotion processing tasks was not depressed to the same extent. Thus, the ability to recognise and identity non-verbal affect cues in emotional facial expression and emotional prosody was relatively preserved in patients with Alzheimer's disease. Secondly, no relationship was found in the Alzheimer disease group between performance on face recognition and prosody tasks. This evidence is consistent with the notion that the mechanisms responsible for discriminating emotional facial expression are dissociated from those involved in discriminating emotional prosody. However, these findings need to be interpreted with caution in view of the small sample size and low statistical power. Lastly, a number of post-study hypotheses were generated in relation to the Alzheimer disease group. These related to the number and type of errors made on tasks of face and prosody discrimination and suggestions were made regarding further investigation in this area. Finally, limitations of the study, implications for clinical practice, such as assessment and intervention focussing on preserved emotion processing ability and suggestions for future research are considered.The BRACE Centre Memory Clinic and Franchay Healthcare Trus

    Language Production: A complex dynamic system with a chronometric footprint

    Get PDF
    In this paper we outline a new approach to the study of language production. Central to this approach is the assumption that communication takes place in a dynamic environment in which cognitive resources are deployed to achieve ‘Right-Time’ as distinct from ‘Fast-as-Possible’ solutions. This is based on the assumption that language production includes a single, integrated, interactive process that recruits and coordinates information from a variety of internal, external and interactive sources to build each speech segment. The output of this process is reflected in the longer of the two log-normal pause duration distributions observed in spontaneous speech (Kirsner, Dunn, Hird, Parkin & Clark, 2002). The methodology described here permits the inspection of temporally defined processes under natural speaking conditions. The procedures do not rely on the assumption that language is the product of independent components that can be studied under static, de-contextualised conditions. Results from aphasia, amnesia and bilingualism will be used to illustrate the new paradigm

    ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications

    Full text link
    Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.Comment: Manuscript under review; The code will be available at https://github.com/idiap/atco2-corpu

    Toward Widely-Available and Usable Multimodal Conversational Interfaces

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 159-166).Multimodal conversational interfaces, which allow humans to interact with a computer using a combination of spoken natural language and a graphical interface, offer the potential to transform the manner by which humans communicate with computers. While researchers have developed myriad such interfaces, none have made the transition out of the laboratory and into the hands of a significant number of users. This thesis makes progress toward overcoming two intertwined barriers preventing more widespread adoption: availability and usability. Toward addressing the problem of availability, this thesis introduces a new platform for building multimodal interfaces that makes it easy to deploy them to users via the World Wide Web. One consequence of this work is City Browser, the first multimodal conversational interface made publicly available to anyone with a web browser and a microphone. City Browser serves as a proof-of-concept that significant amounts of usage data can be collected in this way, allowing a glimpse of how users interact with such interfaces outside of a laboratory environment. City Browser, in turn, has served as the primary platform for deploying and evaluating three new strategies aimed at improving usability. The most pressing usability challenge for conversational interfaces is their limited ability to accurately transcribe and understand spoken natural language. The three strategies developed in this thesis - context-sensitive language modeling, response confidence scoring, and user behavior shaping - each attack the problem from a different angle, but they are linked in that each critically integrates information from the conversational context.by Alexander Gruenstein.Ph.D

    Providing personalized Internet services by means of context-aware spoken dialogue systems

    Get PDF
    The widespread use of new mobile technology implementing wireless communications enables a new type of advanced applications to access information services on the Internet. In order to provide services which meet the user needs through intelligent information retrieval, the system must sense and interpret the user environment and the communication context. Though context-awareness is vital to provide services adapted to the user preferences, it cannot be useful if such services are difficult to access. The development of spoken dialogue systems for these applications facilitates interaction in natural language with the environment which is also benefited from contextual information. In this paper, we propose a framework to develop context-aware dialogue systems that dynamically incorporate user specific requirements and preferences as well as characteristics about the interaction environment, in order to improve and personalize web information and services. We have identified the major components for context-aware dialogue systems and placed them within a general-purpose architecture. The framework also describes a representation mode based on a dialogue register in order to share information between the elements of the architecture, and incorporates statistical methodologies for dialogue management in order to reduce the effort required for both the implementation of a new system and the adaptation to a new task. We have evaluated our proposal developing a travel-planning system, and provide a detailed discussion of its positive influence in the quality of the interaction and the information and services provided.Research funded by projects CICYT TIN2011- 28620-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS (S2009/TIC-1485), and DPS2008- 07029-C02-02.Publicad

    Music contact and language contact: A proposal for comparative research

    Get PDF
    The concept of convergence, from the study of language contact, provides a model for better understanding interactions between cognitive systems of the same type (for example, in bilingualism, subsystem instantiations of the same kind of knowledge representation and its associated processing mechanisms). For a number of reasons, musical ability is the domain that allows for the most interesting comparisons and contrasts with language in this area of research. Both cross-language and cross-musical idiom interactions show a vast array of different kinds of mutual influence, all of which are highly productive, ranging from so-called transfer effects to total replacement (attrition of the replaced subsystem). The study of music contact should also help investigators conceptualize potential structural parallels between separate mental faculties, most importantly, it would seem, between those that appear to share component competence and processing modules in common. The first part of the proposal is to determine if the comparison between the two kinds of convergence (in language and in music) is a useful way of thinking about how properties of each system are similar, analogous, different and so forth. This leads to a more general discussion about the design features of mental faculties, what might define them “narrowly,” for example

    Astrospeak in turns : Minimal sequences of conversation in institutional talk

    Get PDF
    Interaction in many professions or institutions has led to the development of domainspecific language varieties, or registers. To distinguish these forms of language from other, everyday registers, linguistics has grouped them as Languages for Specific purposes (LSPs). The thesis focuses on the LSP used by the Apollo astronauts and mission control during the US Moon landing program in the late 1960’s/early 1970’s. The data used are the transcripts of the Apollo 12 air-to-ground transmissions from 1969, accessible online via a NASA website. The data is described using register analytical approaches provided by Biber & Conrad (2009). Additionally, institutional constraints and radiotelephony (R/T) are discussed as restrictive factors for talk. The thesis concentrates on the sequence organization of the language variety, named Astrospeak by the author. More specifically, the thesis employs the methodologies of conversation analysis (CA) to explore the minimal sequence structure of Astrospeak. A minimal sequence refers to the unexpanded base form of sequence which is constructed out of turns, and whose organization is the foundation for interaction. In addition to examining the minimal sequence structure of Astrospeak, the thesis questions discuss the nature of the interaction in relation to the two research approaches for conversation: CA and Institutional CA (ICA). The hypothesis is that Astrospeak favors a three-turn minimal sequence structure. This follows the findings of Kevoe-Feldman & Robinson (2012). Using common CA frameworks amended with institutional considerations, the analysis shows that Astrospeak does take a three-turn minimal sequence when initiated with an open question. Conversely, if no tangible information is produced in the response turn, a third turn is not triggered. The presence of the third turn is due to institutional and R/T constraints. The thesis is the first linguistic study of Astrospeak. In addition to exploring the sequence organization of the interaction, the thesis suggests that ICA-specific frameworks could help theoretically differentiate LSPs from ‘ordinary’ talk. For that, the concept of procedurality is discussed as a descriptive tool for further research
    corecore