488 research outputs found

    A Conversational Academic Assistant for the Interaction in Virtual Worlds

    Get PDF
    Proceedings of: Forth International Workshop on User-Centric Technologies and applications (CONTEXTS 2010). Valencia, 07-10 September , 2010.The current interest and extension of social networking are rapidly introducing a large number of applications that originate new communication and interaction forms among their users. Social networks and virtual worlds, thus represent a perfect environment for interacting with applications that use multimodal information and are able to adapt to the specific characteristics and preferences of each user. As an example of this application, in this paper we present an example of the integration of conversational agents in social networks, describing the development of a conversational avatar that provides academic information in the virtual world of Second Life. For its implementation techniques from Speech Technologies and Natural Language Processing have been used to allow a more natural interaction with the system using voice.Funded by projects CICYT TIN2008-06742-C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, SINPROB, CAM MADRINET S-0505/TIC/0255, and DPS2008-07029-C02-02.Publicad

    Multimodal fusion : gesture and speech input in augmented reality environment

    Get PDF
    Augmented Reality (AR) has the capability to interact with the virtual objects and physical objects simultaneously since it combines the real world with virtual world seamlessly. However, most AR interface applies conventional Virtual Reality (VR) interaction techniques without modification. In this paper we explore the multimodal fusion for AR with speech and hand gesture input. Multimodal fusion enables users to interact with computers through various input modalities like speech, gesture, and eye gaze. At the first stage to propose the multimodal interaction, the input modalities are decided to be selected before be integrated in an interface. The paper presents several related works about to recap the multimodal approaches until it recently has been one of the research trends in AR. It presents the assorted existing works in multimodal for VR and AR. In AR, multimodal considers as the solution to improve the interaction between the virtual and physical entities. It is an ideal interaction technique for AR applications since AR supports interactions in real and virtual worlds in the real-time. This paper describes the recent studies in AR developments that appeal gesture and speech inputs. It looks into multimodal fusion and its developments, followed by the conclusion.This paper will give a guideline on multimodal fusion on how to integrate the gesture and speech inputs in AR environment

    Multimedia information technology and the annotation of video

    Get PDF
    The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

    Ghost-in-the-Machine reveals human social signals for human-robot interaction

    Get PDF
    © 2015 Loth, Jettka, Giuliani and de Ruiter. We used a new method called "Ghost-in-the-Machine" (GiM) to investigate social interactions with a robotic bartender taking orders for drinks and serving them. Using the GiM paradigm allowed us to identify how human participants recognize the intentions of customers on the basis of the output of the robotic recognizers. Specifically, we measured which recognizer modalities (e.g., speech, the distance to the bar) were relevant at different stages of the interaction. This provided insights into human social behavior necessary for the development of socially competent robots. When initiating the drink-order interaction, the most important recognizers were those based on computer vision. When drink orders were being placed, however, the most important information source was the speech recognition. Interestingly, the participants used only a subset of the available information, focussing only on a few relevant recognizers while ignoring others. This reduced the risk of acting on erroneous sensor data and enabled them to complete service interactions more swiftly than a robot using all available sensor data. We also investigated socially appropriate response strategies. In their responses, the participants preferred to use the same modality as the customer's requests, e.g., they tended to respond verbally to verbal requests. Also, they added redundancy to their responses, for instance by using echo questions. We argue that incorporating the social strategies discovered with the GiM paradigm in multimodal grammars of human-robot interactions improves the robustness and the ease-of-use of these interactions, and therefore provides a smoother user experience

    Interpretation of Multiparty Meetings: The AMI and AMIDA Projects

    Get PDF
    The AMI and AMIDA projects are collaborative EU projects concerned with the automatic recognition and interpretation of multiparty meetings. This paper provides an overview of the advances we have made in these projects with a particular focus on the multimodal recording infrastructure, the publicly available AMI corpus of annotated meeting recordings, and the speech recognition framework that we have developed for this domain

    A Proposal for Processing and Fusioning Multiple Information Sources in Multimodal Dialog Systems

    Get PDF
    Proceedings of: PAAMS 2014 International Workshops. Agent-based Approaches for the Transportation Modelling and Optimisation (AATMO' 14 ) & Intelligent Systems for Context-based Information Fusion (ISCIF' 14). Salamanca, Spain, June 4-6, 2014.Multimodal dialog systems can be defined as computer systems that process two or more user input modes and combine them with multimedia system output. This paper is focused on the multimodal input, providing a proposal to process and fusion the multiple input modalities in the dialog manager of the system, so that a single combined input is used to select the next system action. We describe an application of our technique to build multimodal systems that process user's spoken utterances, tactile and keyboard inputs, and information related to the context of the interaction. This information is divided in our proposal into external and internal context, user's internal, represented in our contribution by the detection of their intention during the dialog and their emotional state.This work was supported in part by Projects MINECO TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS (S2009/TIC-1485)

    Modeling the user state for context-aware spoken interaction in ambient assisted living

    Get PDF
    Ambient Assisted Living (AAL) systems must provide adapted services easily accessible by a wide variety of users. This can only be possible if the communication between the user and the system is carried out through an interface that is simple, rapid, effective, and robust. Natural language interfaces such as dialog systems fulfill these requisites, as they are based on a spoken conversation that resembles human communication. In this paper, we enhance systems interacting in AAL domains by means of incorporating context-aware conversational agents that consider the external context of the interaction and predict the user's state. The user's state is built on the basis of their emotional state and intention, and it is recognized by means of a module conceived as an intermediate phase between natural language understanding and dialog management in the architecture of the conversational agent. This prediction, carried out for each user turn in the dialog, makes it possible to adapt the system dynamically to the user's needs. We have evaluated our proposal developing a context-aware system adapted to patients suffering from chronic pulmonary diseases, and provide a detailed discussion of the positive influence of our proposal in the success of the interaction, the information and services provided, as well as the perceived quality.This work was supported in part by Projects MINECO TEC2012-37832-C02-01, CICYT TEC2011-28626-C02- 02, CAM CONTEXTS (S2009/TIC-1485
    • 

    corecore