Search CORE

164 research outputs found

Design and evaluation of acceleration strategies for speeding up the development of dialog applications

Author: Agah
Bohus
Chung
D’Haro
Javier Ferreiros
José Manuel Pardo
Jung
Luis Fernando D’Haro
McTear
Pargellis
Ricardo de Córdoba
Rubén San-Segundo
Tsai
Wang
Wolters
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

In this paper, we describe a complete development platform that features different innovative acceleration strategies, not included in any other current platform, that simplify and speed up the definition of the different elements required to design a spoken dialog service. The proposed accelerations are mainly based on using the information from the backend database schema and contents, as well as cumulative information produced throughout the different steps in the design. Thanks to these accelerations, the interaction between the designer and the platform is improved, and in most cases the design is reduced to simple confirmations of the “proposals” that the platform dynamically provides at each step. In addition, the platform provides several other accelerations such as configurable templates that can be used to define the different tasks in the service or the dialogs to obtain or show information to the user, automatic proposals for the best way to request slot contents from the user (i.e. using mixed-initiative forms or directed forms), an assistant that offers the set of more probable actions required to complete the definition of the different tasks in the application, or another assistant for solving specific modality details such as confirmations of user answers or how to present them the lists of retrieved results after querying the backend database. Additionally, the platform also allows the creation of speech grammars and prompts, database access functions, and the possibility of using mixed initiative and over-answering dialogs. In the paper we also describe in detail each assistant in the platform, emphasizing the different kind of methodologies followed to facilitate the design process at each one. Finally, we describe the results obtained in both a subjective and an objective evaluation with different designers that confirm the viability, usefulness, and functionality of the proposed accelerations. Thanks to the accelerations, the design time is reduced in more than 56% and the number of keystrokes by 84%

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

A FRAMEWORK FOR INTELLIGENT VOICE-ENABLED E-EDUCATION SYSTEMS

Author: Atayero A. A.
Ayo C. K.
Azeta A. A.
Omoregbe N. A.
Publication venue
Publication date: 01/01/2009
Field of study

Although the Internet has received significant attention in recent years, voice is still the most convenient and natural way of communicating between human to human or human to computer. In voice applications, users may have different needs which will require the ability of the system to reason, make decisions, be flexible and adapt to requests during interaction. These needs have placed new requirements in voice application development such as use of advanced models, techniques and methodologies which take into account the needs of different users and environments. The ability of a system to behave close to human reasoning is often mentioned as one of the major requirements for the development of voice applications. In this paper, we present a framework for an intelligent voice-enabled e-Education application and an adaptation of the framework for the development of a prototype Course Registration and Examination (CourseRegExamOnline) module. This study is a preliminary report of an ongoing e-Education project containing the following modules: enrollment, course registration and examination, enquiries/information, messaging/collaboration, e-Learning and library. The CourseRegExamOnline module was developed using VoiceXML for the voice user interface(VUI), PHP for the web user interface (WUI), Apache as the middle-ware and MySQL database as back-end. The system would offer dual access modes using the VUI and WUI. The framework would serve as a reference model for developing voice-based e-Education applications. The e-Education system when fully developed would meet the needs of students who are normal users and those with certain forms of disabilities such as visual impairment, repetitive strain injury (RSI), etc, that make reading and writing difficult

Covenant University Repository

Vox et praeterea nihil

Author: Tverrå Kai
Publication venue: The University of Bergen
Publication date: 01/01/2004
Field of study

This paper explores design issues when creating a VoiceXML application, with special emphasis on grammar design and dialog flow. The paper can serve as a starting point for learning the basics of VoiceXML. The test application is an automatic directory assistance. Emneord: VoiceXML, vxml, Automatic Directory Assistance, Automated Speech Recognition, ASR, Text to Speech, TTS, dialog flow, Voice User Interface, VUI, VSP, Voice Service Provide

University of Bergen

NORA - Norwegian Open Research Archives

Real time speech translator

Author: Garcia Cabrera Xavier
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2008
Field of study

This document is written to report on my work on the Real Time Voice Translator Project, the project I carried out as my final thesis project during the academic year 2007‐2008. During this period I have been working in the Research and Development Center (RDC) for Mobile Applications, a department of the Czech Technical University (CTU) in Prague. In the RDC I was a member of the Automatic Call Center Project (ACC Project) team, and within it, I was assigned to carry out the Real Time Voice Translator Project. The Automatic Call Center Project (ACC Project), now renamed to Voice2Web Project, is a project carried out by the Research and Development Center. The RDC is a department inside the Electro Technical Faculty of the CTU that carries out Research and Development projects regrding the Information Technologies (IT). Some of its partners are IBM, Vodafone and Ericson, who the RDC is doing projects for. The ACC Project began on 2007 and its aim is to develop Voice Applications, within the IBM and RDC agreement, using IBM Voice Technologies and whatever open standards or open source software. IBM is an ACC Project partner and provides financing for it. It also provides hardware and software licenses to the ACC Project and gives us support. The members of the ACC Project are developing several Voice Applications at the same time, all them following the ACC Project purposes. Although this document is focused on the Real Time Voice Translator Project, it will also explain in the introduction some aspects of the ACC Project. This is because the Real Time Voice Translator Project has a lot of points in common with it and it is worth, to understand it well, understand some points of the ACC Project as well.

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Application of backend database contents and structure to the design of spoken dialog services

Author: D’Haro
Gorin
Javier Ferreiros
José Manuel Pardo
Juan Manuel Montero
Jung
Luis Fernando D’Haro
López-Cózar
McTear
Paternò
Ricardo de Córdoba
Wang
Wolters
Zajicek
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Current development platforms for designing spoken dialog services feature different kinds of strategies to help designers build, test, and deploy their applications. In general, these platforms are made up of several assistants that handle the different design stages (e.g. definition of the dialog flow, prompt and grammar definition, database connection, or to debug and test the running of the application). In spite of all the advances in this area, in general the process of designing spoken-based dialog services is a time consuming task that needs to be accelerated. In this paper we describe a complete development platform that reduces the design time by using different types of acceleration strategies based on using information from the data model structure and database contents, as well as cumulative information obtained throughout the successive steps in the design. Thanks to these accelerations, the interaction with the platform is simplified and the design is reduced, in most cases, to simple confirmations to the “proposals” that the platform automatically provides at each stage. Different kinds of proposals are available to complete the application flow such as the possibility of selecting which information slots should be requested to the user together, predefined templates for common dialogs, the most probable actions that make up each state defined in the flow, different solutions to solve specific speech-modality problems such as the presentation of the lists of retrieved results after querying the backend database. The platform also includes accelerations for creating speech grammars and prompts, and the SQL queries for accessing the database at runtime. Finally, we will describe the setup and results obtained in a simultaneous summative, subjective and objective evaluations with different designers used to test the usability of the proposed accelerations as well as their contribution to reducing the design time and interaction

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Staging Transformations for Multimodal Web Interaction Management

Author: Narayan Michael
Perugini Saverio
Ramakrishnan Naren
Williams Chris
Publication venue
Publication date: 20/11/2003
Field of study

Multimodal interfaces are becoming increasingly ubiquitous with the advent of mobile devices, accessibility considerations, and novel software technologies that combine diverse interaction media. In addition to improving access and delivery capabilities, such interfaces enable flexible and personalized dialogs with websites, much like a conversation between humans. In this paper, we present a software framework for multimodal web interaction management that supports mixed-initiative dialogs between users and websites. A mixed-initiative dialog is one where the user and the website take turns changing the flow of interaction. The framework supports the functional specification and realization of such dialogs using staging transformations -- a theory for representing and reasoning about dialogs based on partial input. It supports multiple interaction interfaces, and offers sessioning, caching, and co-ordination functions through the use of an interaction manager. Two case studies are presented to illustrate the promise of this approach.Comment: Describes framework and software architecture for multimodal web interaction managemen

arXiv.org e-Print Archive

CiteSeerX

University of Dayton

Development and Evaluation of the Spoken Dialogue System Based on W3C Recommendations

Author: Jozef Juhár
Stanislav Ondáš
Publication venue: 'IntechOpen'
Publication date: 02/11/2010
Field of study

IntechOpen

VXML: AN ALTERNATIVE SOLUTION TO ACCESSING WEBSITE'S CONTENTS

Author: Hashim Mohd Hafiz
Publication venue: Universiti Teknologi Petronas
Publication date: 01/01/2008
Field of study

Career Center Phone-based Application (CCPA) is a support tool for a career website that is developed using VoiceXML technology which allows users to access the contents of the website via phone call. The use of VoiceXML technology that connects the callers to the application via Public Switched Telephone Network (PSTN) has made the phone-based application accessible by any types of telephone, anywhere around the globe. CCP A work the same as SMS career tool that provides alternative to receive and update the website's content other than using Internet connection. However, this CCPA provides more than just receiving job alerts and applying for a job via SMS. With CCPA, the callers will have a new experience that is like "talking" with the content of the website. Here, callers may retrieve the company's profile, submit voice inquiries, authenticate/validate users for login, retrieve latest I 0 job opportunities that is available on the website and apply for a job where all are made via phone call. This is a new environment for career-based application that allows command and output presented in speech format plus, it will be the first VoiceXML-based application in Malaysia. User Centric Design (UCD) approaches has been selected to develop CCP A as it focuses on users' requirements and preferences while the Be Vocal cafe is chosen as an ASP to develop, test and host the voice application. This working CCP A has been tested using Black Box Testing on Vocal Scripter that is a real simulation of telephone. Vocal Scripter is used as it is cost free and effective plus, the application works when it runs perfectly on Vocal Scripter. Some testing using real telephones also have been conducted. As the result of this development, 5 main modules are implemented which are the general section that covers the welcome message, main menu and global help and menu links, voice inquiry section, users authentication section, job post retrieval section and job application section. In near future, the CCP A should be improved with Mixed Initiative Dialog approach that will provide a great call experience, enabled to support Malay Language and personalized to provide different and unique way of entertaining each of callers. In conclusion, this project will definitely initiate and encourage VoiceXML applications' development in Malaysia

UTPedia

Constructing a low-cost, open-source, VoiceXML

Author: King Adam
Publication venue: Faculty of Science, Computer Science
Publication date: 01/07/2013
Field of study

Voice-enabled applications, applications that interact with a user via an audio channel, are used extensively today. Their use is growing as speech related technologies improve, as speech is one of the most natural methods of interaction. They can provide customer support as IVRs, can be used as an assistive technology, or can become an aural interface to the Internet. Given that the telephone is used extensively throughout the globe, the number of potential users of voice-enabled applications is very high. VoiceXML is a popular, open, high-level, standard means of creating voice-enabled applications which was designed to bring the benefits of web based development to services. While VoiceXML is an ideal language for creating these applications, VoiceXML gateways, the hardware and software responsible for interpreting VoiceXML applications and interfacing with the PSTN, are still expensive and so there is a need for a low-cost gateway. Asterisk, and open-source, TDM/VoIP telephony platform, can be used as a low-cost PSTN interface. This thesis investigates adding a VoiceXML service to Asterisk, creating a low-cost VoiceXML prototype gateway which is able to render voice-enabled applications. Following the Component-Based Software Engineering (CBSE) paradigm, the VoiceXML gateway is divided into a set of components which are sourced from the open-source community, and integrated to create the gateway. The browser requires a VoiceXML interpreter (OpenVXI), a Text-To-Speech engine (Festival) and a speech recognition engine (Sphinx 4). The integration of the components results in a low-cost, open-source VoiceXML gateway. System tests show that the integration of the components was successful, and that the system can handle concurrent calls. A fully compliant version of the gateway can be used in the real world to render voice-enabled applications at a low cost.KMBT_363Adobe Acrobat 9.55 Paper Capture Plug-i

South East Academic Libraries System (SEALS)

VOICE AUTHENTICATION USING VOICEXML

Author: ABDUL WAHAB AZRUL
Publication venue: Universiti Teknologi Petronas
Publication date: 01/06/2004
Field of study

User Authentication through voice is one of the methods to ensure the protection of the sensitive data over the Internet. In this research, author wants to explore the technology and acquire clear understanding of VoiceXML, its concept and its architecture. Provided with understanding ofVoiceXML concept and architecture, this research will examines 3 levels of security using VoiceXML capabilities as a solution for validating users. The identified solutions will lead to development of VoiceXML prototype using available VoiceXML application development tool. This project has two main objectives. The first objective is to understand VoiceXML technology architecture and learn to develop and design a VoiceXML application. The second objective of the project is to observe three levels of securityas a solution for validating users. This project involved two approaches whichare performing researchon VoiceXML technology and developinga prototype that conclude the findings on the research performed. In the development of the prototype, Voice Application Life Cycle methodology is used which include 4 phases; Planning, Prototyping and Iteration, Development, and Launch. As the result, the prototype is expected to take full advantages ofcurrent VoiceXML technology. The prototype can be used as a template for future use by developers in order to make their voice application achieve the goal of user authentication which is to make the right information reliably and securely available to the right people. The final product for the projectis a prototype that has been called Voice Authentication through Speech Recognition that focuses on different level ofsecurity using voice authentication with VoiceXML

UTPedia