77 research outputs found

    Bringing together commercial and academic perspectives for the development of intelligent AmI interfaces

    Get PDF
    The users of Ambient Intelligence systems expect an intelligent behavior from their environment, receiving adapted and easily accessible services and functionality. This can only be possible if the communication between the user and the system is carried out through an interface that is simple (i.e. which does not have a steep learning curve), fluid (i.e. the communication takes place rapidly and effectively), and robust (i.e. the system understands the user correctly). Natural language interfaces such as dialog systems combine the previous three requisites, as they are based on a spoken conversation between the user and the system that resembles human communication. The current industrial development of commercial dialog systems deploys robust interfaces in strictly defined application domains. However, commercial systems have not yet adopted the new perspective proposed in the academic settings, which would allow straightforward adaptation of these interfaces to various application domains. This would be highly beneficial for their use in AmI settings as the same interface could be used in varying environments. In this paper, we propose a new approach to bridge the gap between the academic and industrial perspectives in order to develop dialog systems using an academic paradigm while employing the industrial standards, which makes it possible to obtain new generation interfaces without the need for changing the already existing commercial infrastructures. Our proposal has been evaluated with the successful development of a real dialog system that follows our proposed approach to manage dialog and generates code compliant with the industry-wide standard VoiceXML.Research funded by projects CICYT TIN2011-28620-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS (S2009/TIC-1485), and DPS2008- 07029-C02-02.Publicad

    Application of backend database contents and structure to the design of spoken dialog services

    Get PDF
    Current development platforms for designing spoken dialog services feature different kinds of strategies to help designers build, test, and deploy their applications. In general, these platforms are made up of several assistants that handle the different design stages (e.g. definition of the dialog flow, prompt and grammar definition, database connection, or to debug and test the running of the application). In spite of all the advances in this area, in general the process of designing spoken-based dialog services is a time consuming task that needs to be accelerated. In this paper we describe a complete development platform that reduces the design time by using different types of acceleration strategies based on using information from the data model structure and database contents, as well as cumulative information obtained throughout the successive steps in the design. Thanks to these accelerations, the interaction with the platform is simplified and the design is reduced, in most cases, to simple confirmations to the “proposals” that the platform automatically provides at each stage. Different kinds of proposals are available to complete the application flow such as the possibility of selecting which information slots should be requested to the user together, predefined templates for common dialogs, the most probable actions that make up each state defined in the flow, different solutions to solve specific speech-modality problems such as the presentation of the lists of retrieved results after querying the backend database. The platform also includes accelerations for creating speech grammars and prompts, and the SQL queries for accessing the database at runtime. Finally, we will describe the setup and results obtained in a simultaneous summative, subjective and objective evaluations with different designers used to test the usability of the proposed accelerations as well as their contribution to reducing the design time and interaction

    Automatic translation of formal data specifications to voice data-input applications.

    Get PDF
    This thesis introduces a complete solution for automatic translation of formal data specifications to voice data-input applications. The objective of the research is to automatically generate applications for inputting data through speech from specifications of the structure of the data. The formal data specifications are XML DTDs. A new formalization called Grammar-DTD (G-DTD) is introduced as an extended DTD that contains grammars to describe valid values of the DTD elements and attributes. G-DTDs facilitate the automatic generation of Voice XML applications that correspond to the original DTD structure. The development of the automatic application-generator included identifying constraints on the G-DTD to ensure a feasible translation, using predicate calculus to build a knowledge base of inference rules that describes the mapping procedure, and writing an algorithm for the automatic translation based on the inference rules.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2006 .H355. Source: Masters Abstracts International, Volume: 45-01, page: 0354. Thesis (M.Sc.)--University of Windsor (Canada), 2006

    The Geranium system: multimodal conversational agents for e-learning

    Get PDF
    Proceedings of: 11th International Symposium on Distributed Computing and Artificial Intelligence 2014 (DCAI 2014) held at the University of Salamanca (Spain) 4th-6th June, 2014Many e-learning applications use conversational agents as means to obtain enhanced pedagogical results such as fostering motivation and engagement, incrementing significant learning and helping in the acquisition of meta-cognitive skills. In this paper, we present Geranium, a multimodal conversational agent that helps children to appreciate and protect their environment. The system, which integrates an interactive chatbot, provides a modular and scalable framework that eases building pedagogic conversational agents that can interact with the students using speech and natural language.This work was supported in part by Projects MINECO TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS (S2009/TIC-1485)

    On the Development of Adaptive and User-Centred Interactive Multimodal Interfaces

    Get PDF
    Multimodal systems have attained increased attention in recent years, which has made possible important improvements in the technologies for recognition, processing, and generation of multimodal information. However, there are still many issues related to multimodality which are not clear, for example, the principles that make it possible to resemble human-human multimodal communication. This chapter focuses on some of the most important challenges that researchers have recently envisioned for future multimodal interfaces. It also describes current efforts to develop intelligent, adaptive, proactive, portable and affective multimodal interfaces

    End-user programming of a social robot by dialog

    Get PDF
    One of the main challenges faced by social robots is how to provide intuitive, natural and enjoyable usability for the end-user. In our ordinary environment, social robots could be important tools for education and entertainment (edutainment) in a variety of ways. This paper presents a Natural Programming System (NPS) that is geared to non-expert users. The main goal of such a system is to provide an enjoyable interactive platform for the users to build different programs within their social robot platform. The end-user can build a complex net of actions and conditions (a sequence) in a social robot via mixed-initiative dialogs and multimodal interaction. The system has been implemented and tested in Maggie, a real social robot with multiple skills, conceived as a general HRI researching platform. The robot's internal features (skills) have been implemented to be verbally accessible to the end-user, who can combine them into others that are more complex following a bottom-up model. The built sequence is internally implemented as a Sequence Function Chart (SFC), which allows parallel execution, modularity and re-use. A multimodal Dialog Manager System (DMS) takes charge of keeping the coherence of the interaction. This work is thought for bringing social robots closer to non-expert users, who can play the game of "teaching how to do things" with the robot.The research leading to these results has received funding from the RoboCity2030-II-CM project (S2009/DPI-1559), funded by Programas de Actividades I+D en la Comunidad de Madrid and cofunded by Structural Funds of the EU. The authors also gratefully acknowledge the funds provided by the Spanish Ministry of Science and Innovation (MICINN) through the project named “A New Approach to Social Robots” (AROS) DPI2008-01109

    Analysis and Design of Speech-Recognition Grammars

    Get PDF
    Currently, most commercial speech-enabled products are constructed using grammar-based technology. Grammar design is a critical issue for good recognition accuracy. Two methods are commonly used for creating grammars: 1) to generate them automatically from a large corpus of input data which is very costly to acquire, or 2) to construct them using an iterative process involving manual design, followed by testing with end-user speech input. This is a time-consuming and very expensive process requiring expert knowledge of language design, as well as the application area. Another hurdle to the creation and use of speech-enabled applications is that expertise is also required to integrate the speech capability with the application code and to deploy the application for wide-scale use. An alternative approach, which we propose, is 1) to construct them using the iterative process described above, but to replace end-user testing by analysis of the recognition grammars using a set of grammar metrics which have been shown to be good indicators of recognition accuracy, 2) to improve recognition accuracy in the design process by encoding semantic constraints in the syntax rules of the grammar, 3) to augment the above process by generating recognition grammars automatically from specifications of the application, and 4) to use tools for creating speech-enabled applications together with an architecture for their deployment which enables expert users, as well as users who do not have expertise in language processing, to easily build speech applications and add them to the web

    VXML: AN ALTERNATIVE SOLUTION TO ACCESSING WEBSITE'S CONTENTS

    Get PDF
    Career Center Phone-based Application (CCPA) is a support tool for a career website that is developed using VoiceXML technology which allows users to access the contents of the website via phone call. The use of VoiceXML technology that connects the callers to the application via Public Switched Telephone Network (PSTN) has made the phone-based application accessible by any types of telephone, anywhere around the globe. CCP A work the same as SMS career tool that provides alternative to receive and update the website's content other than using Internet connection. However, this CCPA provides more than just receiving job alerts and applying for a job via SMS. With CCPA, the callers will have a new experience that is like "talking" with the content of the website. Here, callers may retrieve the company's profile, submit voice inquiries, authenticate/validate users for login, retrieve latest I 0 job opportunities that is available on the website and apply for a job where all are made via phone call. This is a new environment for career-based application that allows command and output presented in speech format plus, it will be the first VoiceXML-based application in Malaysia. User Centric Design (UCD) approaches has been selected to develop CCP A as it focuses on users' requirements and preferences while the Be Vocal cafe is chosen as an ASP to develop, test and host the voice application. This working CCP A has been tested using Black Box Testing on Vocal Scripter that is a real simulation of telephone. Vocal Scripter is used as it is cost free and effective plus, the application works when it runs perfectly on Vocal Scripter. Some testing using real telephones also have been conducted. As the result of this development, 5 main modules are implemented which are the general section that covers the welcome message, main menu and global help and menu links, voice inquiry section, users authentication section, job post retrieval section and job application section. In near future, the CCP A should be improved with Mixed Initiative Dialog approach that will provide a great call experience, enabled to support Malay Language and personalized to provide different and unique way of entertaining each of callers. In conclusion, this project will definitely initiate and encourage VoiceXML applications' development in Malaysia

    User interfaces for multimodal systems

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2001.Includes bibliographical references (leaves 68-69).As computer systems become more powerful and complex, efforts to make computer interfaces more simple and natural become increasingly important. Natural interfaces should be designed to facilitate communication in ways people are already accustomed to using. Such interfaces allow users to concentrate on the tasks they are trying to accomplish, not worry about what they must do to control the interface. Multimodal systems process combined natural input modes- such as speech, pen, touch, manual gestures, gaze, and head and body movements- in a coordinated manner with multimedia system output. The initiative at W3C is to make the development of interfaces simple and easy to distribute applications across the Internet in an XML development environment. The languages so far such as HTML designed at W3C are for a particular platform and are not portable to other platforms. User Interface Markup Language (UIML) has been designed to develop cross-platform interfaces. It will be shown in this thesis that UIML can be used not only to develop multi-platform interfaces but also for creating multimodal interfaces. A survey of existing multimodal applications is performed and an efficient and easy-to-develop methodology is proposed. Later it will be also shown that the methodology proposed satisfies a major set of requirements laid down by W3C for multimodal dialogs.by Sumanth Lingam.M.Eng

    Modeling the user state for context-aware spoken interaction in ambient assisted living

    Get PDF
    Ambient Assisted Living (AAL) systems must provide adapted services easily accessible by a wide variety of users. This can only be possible if the communication between the user and the system is carried out through an interface that is simple, rapid, effective, and robust. Natural language interfaces such as dialog systems fulfill these requisites, as they are based on a spoken conversation that resembles human communication. In this paper, we enhance systems interacting in AAL domains by means of incorporating context-aware conversational agents that consider the external context of the interaction and predict the user's state. The user's state is built on the basis of their emotional state and intention, and it is recognized by means of a module conceived as an intermediate phase between natural language understanding and dialog management in the architecture of the conversational agent. This prediction, carried out for each user turn in the dialog, makes it possible to adapt the system dynamically to the user's needs. We have evaluated our proposal developing a context-aware system adapted to patients suffering from chronic pulmonary diseases, and provide a detailed discussion of the positive influence of our proposal in the success of the interaction, the information and services provided, as well as the perceived quality.This work was supported in part by Projects MINECO TEC2012-37832-C02-01, CICYT TEC2011-28626-C02- 02, CAM CONTEXTS (S2009/TIC-1485
    corecore