182,395 research outputs found

    Modeling Internet as a User-Adapted Speech Service

    Get PDF
    Proceedings of: 7th International Conference, HAIS 2012, Salamanca, Spain, March 28-30th, 2012.The web has become the largest repository of multimedia information and its convergence with telecommunications is now bringing the benefits of web technology and hybrid artificial intelligence systems to hand-held devices. However, maximizing accessibility is not always the main objective in the design of web applications, specially if it is concerned with facilitating access for disabled people. This way, natural spoken conversation and multimodal conversational agents have been proposed as a solution to facilitate a more natural interaction with these kind of devices. In this paper, we describe a proposal to provide spoken access to Internet information that is valid not only to generate basic applications (e.g., web search engines), but also to develop dialog-based speech interfaces that facilitate a user-adapted access that enhances web services. We describe our proposal and detail several applications developed to provide evidences about the benefits of introducing speech to make the enormous web content accessible to all mobile phone users.Research funded by projects CICYT TIN2011-28620- C02-01, CICYT TEC2011-28626-C02-02,CAM CONTEXTS (S2009/TIC-1485), and DPS2008-07029-C02-02.Publicad

    SPA: Web-based platform for easy access to speech processing modules

    Get PDF
    This paper presents SPA, a web-based Speech Analytics platform that integrates several speech processing modules and that makes it possible to use them through the web. It was developed with the aim of facilitating the usage of the modules, without the need to know about software dependencies and specific configurations. Apart from being accessed by a web-browser, the platform also provides a REST API for easy integration with other applications. The platform is flexible, scalable, provides authentication for access restrictions, and was developed taking into consideration the time and effort of providing new services. The platform is still being improved, but it already integrates a considerable number of audio and text processing modules, including: Automatic transcription, speech disfluency classification, emotion detection, dialog act recognition, age and gender classification, non-nativeness detection, hyperarticulation detection, dialog act recognition, and two external modules for feature extraction and DTMF detection. This paper describes the SPA architecture, presents the already integrated modules, and provides a detailed description for the ones most recently integrated.info:eu-repo/semantics/publishedVersio

    Assistant Suite

    Get PDF
    This project was conducted to demonstrate a voice-to-mechanical application from one source to multiple platforms with the use of hardware-to-software technology. The main platforms that are used for implementation is an Amazon Echo Dot, which serves as the voice interceptor to transcribe speech through integrated software hosted within the Amazon Web Services (AWS) cloud network, and a Raspberry Pi microcontroller, which serves as the device which controlled mechanical movements based on what is transcribed from the Echo. The user can speak a command into the Echo to control the movement of one of two RC cars without any physical engagement. The Echo utilizes Wi-Fi to connect to the AWS cloud network to transcribe the speech, which then goes through a series of channels to communicate with a microcontroller that is connected to its own RC car to cause that selected RC car to move in a specified direction. For example, the user can speak a command that says “Alexa, move car A forward for two seconds,” and this will translate to the selected car to motion forward for a total of two seconds. The project also displays the usefulness of being able to speak to multiple microcontrollers connected to separate devices under a single application; this caters to the convenience of not having to close and open separate applications every time a different connection is needed.https://scholarscompass.vcu.edu/capstone/1193/thumbnail.jp

    Implementing a distributed lecture-on-demand multimedia presentation system

    Get PDF
    [[abstract]]Lecture-on-demand (LOD) multimedia presentation technologies in networks are most often used in communication services. Examples of those applications include video-on demand, interactive TV and communication tools in a distance learning system, etc. In this paper, we describe how to present different multimedia objects on a Web-based presentation system. The distributed approach is based on an extended timed Petri net model. Using characterization of extended media streaming technologies, we developed a Web-based multimedia presentation system. For a real-world example, suppose a well-known teacher is giving a lecture/presentation to his students. Because of time constraints and other commitments, many students cannot attend the presentation. The main goal of our system is to provide a feasible method to record and represent a lecture/presentation. Using a browser with windows media services allows those students to view live video of the teacher giving his speech, along with synchronized images of his presentation slides and all the annotations/comments. In our experience, this approach is sufficient for distance learning environments.[[notice]]補正完畢[[conferencetype]]國際[[conferencedate]]20020702~20020705[[conferencelocation]]Vienna, Austri

    A FRAMEWORK FOR INTELLIGENT VOICE-ENABLED E-EDUCATION SYSTEMS

    Get PDF
    Although the Internet has received significant attention in recent years, voice is still the most convenient and natural way of communicating between human to human or human to computer. In voice applications, users may have different needs which will require the ability of the system to reason, make decisions, be flexible and adapt to requests during interaction. These needs have placed new requirements in voice application development such as use of advanced models, techniques and methodologies which take into account the needs of different users and environments. The ability of a system to behave close to human reasoning is often mentioned as one of the major requirements for the development of voice applications. In this paper, we present a framework for an intelligent voice-enabled e-Education application and an adaptation of the framework for the development of a prototype Course Registration and Examination (CourseRegExamOnline) module. This study is a preliminary report of an ongoing e-Education project containing the following modules: enrollment, course registration and examination, enquiries/information, messaging/collaboration, e-Learning and library. The CourseRegExamOnline module was developed using VoiceXML for the voice user interface(VUI), PHP for the web user interface (WUI), Apache as the middle-ware and MySQL database as back-end. The system would offer dual access modes using the VUI and WUI. The framework would serve as a reference model for developing voice-based e-Education applications. The e-Education system when fully developed would meet the needs of students who are normal users and those with certain forms of disabilities such as visual impairment, repetitive strain injury (RSI), etc, that make reading and writing difficult

    Development and Deployment of VoiceXML-Based Banking Applications

    Get PDF
    In recent times, the financial sector has become one of the most vibrant sectors of the Nigerian economy with about twenty five banks after the bank consolidation / merger exercise. This sector presents huge business investments in the area of Information and Communication Technology (ICT). It is also plausible to say that the sector today is the largest body of ICT services and products users. It is no gainsaying the fact that so many Nigerians now carry mobile phones across the different parts of the country. However, applications that provide voice access to real-time banking transactions from anywhere, anytime via telephone are still at their very low stage of adoption across the Nigerian banking and financial sector. A versatile speech-enabled mobile banking application has been developed using VXML, PHP, Apache and MySQL. The developed application provides real-time access to banking services, thus improving corporate bottom-line and Quality of Service (QoS) for customer satisfaction

    IMAGINE Final Report

    No full text

    Development of Telephone-based e-Learning Portal

    Get PDF
    The proliferation of mobile phones in Nigeria, particularly among the student community, has continued to inspire the development and delivery of e-Learning applications. Most of the existing web-based e-Learning applications do not support nomadic voice-based learning (i.e. learning on the move through voice), and consequently do not provide a speedy access to information or enquiries on demand. Internet access is required to get every bit of information from most school portal system, which is not directly available to everyone. Lack of provision for voice in the existing web applications excludes support for people with limited capabilities such as the visually impaired and physical disabilities. In this paper, we present a design and development of a prototype telephone-based e-Learning portal that will be used for course registration and examination. This study is part of an ongoing e-Learning project involving the following modules: enrollment, course registration and examination, enquiries/information, messaging/collaboration, e-Learning and library. The prototype application was developed using VoiceXML for the voice user interface(VUI), PHP for database queries, Apache as the middle-ware and MySQL database as back-end. A unified modelling language (UML) was used to model and design the application. The proposed e-Learning system will compliment the web-based system in other to meet the needs of students with a range of disabilities such as visual impairment, repetitive strain injury, etc, that make reading and writing difficult. It also makes multiple platforms available to all users as well as boosting access to education for the physically challenged, particularly the sight impaired in the developing countries of the world. In institutions where students are not allowed to use mobile phones or where cost is an issue, then the alternative is the use of PC-phone
    • …
    corecore