3,007 research outputs found

    Web-based environment for user generation of spoken dialog for virtual assistants

    Get PDF
    In this paper, a web-based spoken dialog generation environment which enables users to edit dialogs with a video virtual assistant is developed and to also select the 3D motions and tone of voice for the assistant. In our proposed system, “anyone” can “easily” post/edit contents of the dialog for the dialog system. The dialog type corresponding to the system is limited to the question-and-answer type dialog, in order to avoid editing conflicts caused by editing by multiple users. The spoken dialog sharing service and FST generator generates spoken dialog content for the MMDAgent spoken dialog system toolkit, which includes a speech recognizer, a dialog control unit, a speech synthesizer, and a virtual agent. For dialog content creation, question-and-answer dialogs posted by users and FST templates are used. The proposed system was operated for more than a year in a student lounge at the Nagoya Institute of Technology, where users added more than 500 dialogs during the experiment. Images were also registered to 65% of the postings. The most posted category is related to “animation, video games, manga.” The system was subjected to open examination by tourist information staff who had no prior experience with spoken dialog systems. Based on their impressions of tourist use of the dialog system, they shortened the length of some of the system’s responses and added pauses to the longer responses to make them easier to understand

    A Software Framework to Create 3D Browser-Based Speech Enabled Applications

    Get PDF
    The advances in automatic speech recognition have pushed the humancomputer interface researchers to adopt speech as one mean of input data. It is natural to humans, and complements very well other input interfaces. However, integrating an automatic speech recognizer into a complex system (such as a 3D visualization system or a Virtual Reality system) can be a difficult and time consuming task. In this paper we present our approach to the problem, a software framework requiringminimum additional coding from the application developer. The framework combines voice commands with existing interaction code, automating the task of creating a new speech grammar (to be used by the recognizer). A new listener component for theXj3D was created, which makes transparent to the user the integration between the 3D browser and the recognizer. We believe this is a desirable feature for virtual reality system developers, and also to be used as a rapid prototyping tool when experimenting with speech technology

    MMDAE : Dialog scenario editor for MMDAgent on the web browser

    Get PDF
    We have developed MMDAgent (a fully open-source toolkit for voice interaction systems), which runs on a variety of platforms such as personal computers and smartphones. From this, the editing environment of the dialog scenario also needs to be operated on various platforms. So, we develop a scenario editor that is implemented on a Web browser. The purpose of this paper also includes making it easy to edit the scenario. Experiments were conducted for subjects using the proposed scenario editor. It was found that our proposed system provides better readability of a scenario and allows easier editing

    On the Development of Adaptive and User-Centred Interactive Multimodal Interfaces

    Get PDF
    Multimodal systems have attained increased attention in recent years, which has made possible important improvements in the technologies for recognition, processing, and generation of multimodal information. However, there are still many issues related to multimodality which are not clear, for example, the principles that make it possible to resemble human-human multimodal communication. This chapter focuses on some of the most important challenges that researchers have recently envisioned for future multimodal interfaces. It also describes current efforts to develop intelligent, adaptive, proactive, portable and affective multimodal interfaces

    Customisable chatbot as a research instrument

    Get PDF
    Abstract. Chatbots are proliferating rapidly online for a variety of different purposes. This thesis presents a customisable chatbot that was designed and developed as a research instrument for online customer interaction research. The developed chatbot facilitates creation of different bot personas, data management tools, and a fully functional online chat user interface. Customer-facing bots in the system are rulebased, with basic input processing and text response selection based on best match. The system uses its own database to store user-chatbot dialogue history. Further, bots can be assigned unique dialogue scripts and their profiles can be customised concerning name, description and profile image. In the presented validation studies, participants completed a task by taking part in a conversation with different bots, as hosted by the system and invoked through distinct URL parameters. Second, the participants filled in a questionnaire on their experience with the bot, designed to reveal differences in how the bots were perceived. Our results suggest that the chatbot’s personality impacted how customers experienced the interactions. Therefore, the developed system can facilitate research scenarios that deal with investigating participant responses to different chatbot personas. Future work is necessary for a wider range of applications and enhanced response control.Personoitava chatbot tutkimustyökaluna. Tiivistelmä. Chatbotit yleistyvät nopeasti Internetissä ja niitä käytetään enenevissä määrin useissa eri käyttötarkoituksissa. Tämä diplomityö esittelee personoitavan chatbotin, joka on kehitetty tutkimustyökaluksi verkon yli tapahtuvaan vuorovaikutustutkimukseen. Kehitetty chatbot sisältää erilaisten bottipersoonien luonnin, apuvälineitä datan käsittelyn, ja itse botin käyttöliittymän. Järjestelmän käyttäjille vastailevat bottipersoonat ovat sääntöihin perustuvia, niiden syötteet käsitellään suoraviivaisesti ja vastaukseksi valitaan vertailun mukaan paras ennaltamääritellyn skriptin mukaisesti. Järjestelmä käyttää omaa tietokantaa tallentamaan käyttäjä-botti keskusteluhistorian. Lisäksi boteille voidaan asettaa uniikki dialogimalli, ja niiden profiilista voidaan personoida URL-parametrillä nimi, botin kuvaus ja profiilikuva. Chatbotin tekninen toiminta todettiin tutkimuksella, jossa osallistujat suorittivat annetun tehtävän seuraamalla osittain valmista käsikirjoitusta eri bottien kanssa. Tämän jälkeen osallistujat täyttivät käyttäjäkyselyn liittyen heidän kokemukseensa botin kanssa. Kysely oli suunniteltu paljastamaan mahdolliset eroavaisuudet siinä, kuinka botin käyttäytyminen miellettiin keskustelun aikana. Käyttäjätestin tulokset viittaavat siihen, että chatbotin persoonalla oli vaikutus käyttäjien kokemukseen. Kehitetty järjestelmä siis pystyy mahdollistamaan tutkimusasetelmia, joissa tutkitaan osallistujien reaktioita erilaisten chattibottien persooniin. Jatkotyö kehitetyn chatbotin yhteydessä keskittyy monimutkaisempien käyttötarkoitusten lisäämiseen ja botin vastausten parantamiseen edistyksellisemmän luonnollisen kielen käsittelyn avulla

    Proceedings of the 2nd EICS Workshop on Engineering Interactive Computer Systems with SCXML

    Get PDF

    Indexing, browsing and searching of digital video

    Get PDF
    Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver

    Accessing spoken interaction through dialogue processing [online]

    Get PDF
    Zusammenfassung Unser Leben, unsere Leistungen und unsere Umgebung, alles wird derzeit durch Schriftsprache dokumentiert. Die rasante Fortentwicklung der technischen Möglichkeiten Audio, Bilder und Video aufzunehmen, abzuspeichern und wiederzugeben kann genutzt werden um die schriftliche Dokumentation von menschlicher Kommunikation, zum Beispiel Meetings, zu unterstützen, zu ergänzen oder gar zu ersetzen. Diese neuen Technologien können uns in die Lage versetzen Information aufzunehmen, die anderweitig verloren gehen, die Kosten der Dokumentation zu senken und hochwertige Dokumente mit audiovisuellem Material anzureichern. Die Indizierung solcher Aufnahmen stellt die Kerntechnologie dar um dieses Potential auszuschöpfen. Diese Arbeit stellt effektive Alternativen zu schlüsselwortbasierten Indizes vor, die Suchraumeinschränkungen bewirken und teilweise mit einfachen Mitteln zu berechnen sind. Die Indizierung von Sprachdokumenten kann auf verschiedenen Ebenen erfolgen: Ein Dokument gehört stilistisch einer bestimmten Datenbasis an, welche durch sehr einfache Merkmale bei hoher Genauigkeit automatisch bestimmt werden kann. Durch diese Art von Klassifikation kann eine Reduktion des Suchraumes um einen Faktor der Größenordnung 4­10 erfolgen. Die Anwendung von thematischen Merkmalen zur Textklassifikation bei einer Nachrichtendatenbank resultiert in einer Reduktion um einen Faktor 18. Da Sprachdokumente sehr lang sein können müssen sie in thematische Segmente unterteilt werden. Ein neuer probabilistischer Ansatz sowie neue Merkmale (Sprecherinitia­ tive und Stil) liefern vergleichbare oder bessere Resultate als traditionelle schlüsselwortbasierte Ansätze. Diese thematische Segmente können durch die vorherrschende Aktivität charakterisiert werden (erzählen, diskutieren, planen, ...), die durch ein neuronales Netz detektiert werden kann. Die Detektionsraten sind allerdings begrenzt da auch Menschen diese Aktivitäten nur ungenau bestimmen. Eine maximale Reduktion des Suchraumes um den Faktor 6 ist bei den verwendeten Daten theoretisch möglich. Eine thematische Klassifikation dieser Segmente wurde ebenfalls auf einer Datenbasis durchgeführt, die Detektionsraten für diesen Index sind jedoch gering. Auf der Ebene der einzelnen Äußerungen können Dialogakte wie Aussagen, Fragen, Rückmeldungen (aha, ach ja, echt?, ...) usw. mit einem diskriminativ trainierten Hidden Markov Model erkannt werden. Dieses Verfahren kann um die Erkennung von kurzen Folgen wie Frage/Antwort­Spielen erweitert werden (Dialogspiele). Dialogakte und ­spiele können eingesetzt werden um Klassifikatoren für globale Sprechstile zu bauen. Ebenso könnte ein Benutzer sich an eine bestimmte Dialogaktsequenz erinnern und versuchen, diese in einer grafischen Repräsentation wiederzufinden. In einer Studie mit sehr pessimistischen Annahmen konnten Benutzer eines aus vier ähnlichen und gleichwahrscheinlichen Gesprächen mit einer Genauigkeit von ~ 43% durch eine graphische Repräsentation von Aktivität bestimmt. Dialogakte könnte in diesem Szenario ebenso nützlich sein, die Benutzerstudie konnte aufgrund der geringen Datenmenge darüber keinen endgültigen Aufschluß geben. Die Studie konnte allerdings für detailierte Basismerkmale wie Formalität und Sprecheridentität keinen Effekt zeigen. Abstract Written language is one of our primary means for documenting our lives, achievements, and environment. Our capabilities to record, store and retrieve audio, still pictures, and video are undergoing a revolution and may support, supplement or even replace written documentation. This technology enables us to record information that would otherwise be lost, lower the cost of documentation and enhance high­quality documents with original audiovisual material. The indexing of the audio material is the key technology to realize those benefits. This work presents effective alternatives to keyword based indices which restrict the search space and may in part be calculated with very limited resources. Indexing speech documents can be done at a various levels: Stylistically a document belongs to a certain database which can be determined automatically with high accuracy using very simple features. The resulting factor in search space reduction is in the order of 4­10 while topic classification yielded a factor of 18 in a news domain. Since documents can be very long they need to be segmented into topical regions. A new probabilistic segmentation framework as well as new features (speaker initiative and style) prove to be very effective compared to traditional keyword based methods. At the topical segment level activities (storytelling, discussing, planning, ...) can be detected using a machine learning approach with limited accuracy; however even human annotators do not annotate them very reliably. A maximum search space reduction factor of 6 is theoretically possible on the databases used. A topical classification of these regions has been attempted on one database, the detection accuracy for that index, however, was very low. At the utterance level dialogue acts such as statements, questions, backchannels (aha, yeah, ...), etc. are being recognized using a novel discriminatively trained HMM procedure. The procedure can be extended to recognize short sequences such as question/answer pairs, so called dialogue games. Dialog acts and games are useful for building classifiers for speaking style. Similarily a user may remember a certain dialog act sequence and may search for it in a graphical representation. In a study with very pessimistic assumptions users are able to pick one out of four similar and equiprobable meetings correctly with an accuracy ~ 43% using graphical activity information. Dialogue acts may be useful in this situation as well but the sample size did not allow to draw final conclusions. However the user study fails to show any effect for detailed basic features such as formality or speaker identity

    The seamless integration of Web3D technologies with university curricula to engage the changing student cohort

    Get PDF
    The increasing tendency of many university students to study at least some courses at a distance limits their opportunities for the interactions fundamental to learning. Online learning can assist but relies heavily on text, which is limiting for some students. The popularity of computer games, especially among the younger students, and the emergence of networked games and game-like virtual worlds offers opportunities for enhanced interaction in educational applications. For virtual worlds to be widely adopted in higher education it is desirable to have approaches to design and development that are responsive to needs and limited in their resource requirements. Ideally it should be possible for academics without technical expertise to adapt virtual worlds to support their teaching needs. This project identified Web3D, a technology that is based on the X3D standards and which presents 3D virtual worlds within common web browsers, as an approach worth exploring for educational application. The broad goals of the project were to produce exemplars of Web3D for educational use, together with development tools and associated resources to support non-technical academic adopters, and to promote an Australian community of practice to support broader adoption of Web3D in education. During the first year of the project exemplar applications were developed and tested. The Web3D technology was found to be still in a relatively early stage of development in which the application of standards did not ensure reliable operation in different environments. Moreover, ab initio development of virtual worlds and associated tools proved to be more demanding of resources than anticipated and was judged unlikely in the near future to result in systems that non-technical academics could use with confidence. In the second year the emphasis moved to assisting academics to plan and implement teaching in existing virtual worlds that provided relatively easy to use tools for customizing an environment. A project officer worked with participating academics to support the teaching of significant elements of courses within Second LifeTM. This approach was more successful in producing examples of good practice that could be shared with and emulated by other academics. Trials were also conducted with ExitRealityTM, a new Australian technology that presents virtual worlds in a web browser. Critical factors in the success of the project included providing secure access to networked computers with the necessary capability; negotiating the complexity of working across education, design of virtual worlds, and technical requirements; and supporting participants with professional development in the technology and appropriate pedagogy for the new environments. Major challenges encountered included working with experimental technologies that are evolving rapidly and deploying new networked applications on secure university networks. The project has prepared the way for future expansion in the use of virtual worlds for teaching at USQ and has contributed to the emergence of a national network of tertiary educators interested in the educational applications of virtual worlds
    corecore