Search CORE

833 research outputs found

Towards a more natural and intelligent interface with embodied conversation agent

Author: Depickere A.
Fung C.C.
Goh O.S.
Wong K.W.
Publication venue: Murdoch University
Publication date: 01/01/2006
Field of study

Conversational agent also known as chatterbots are computer programs which are designed to converse like a human as much as their intelligent allows. In many ways, they are the embodiment of Turing's vision. The ability for computers to converse with human users using natural language would arguably increase their usefulness. Recent advances in Natural Language Processing (NLP) and Artificial Intelligence (AI) in general have advances this field in realizing the vision of a more humanoid interactive system. This paper presents and discusses the use of embodied conversation agent (ECA) for the imitation games. This paper also presents the technical design of our ECA and its performance. In the interactive media industry, it can also been observed that the ECA are getting popular

CiteSeerX

Research Repository

Universiti Teknikal Malaysia Melaka (UTeM) Repository

Recommended from our members

uC: Ubiquitous Collaboration Platform for Multimodal Team Interaction Support

Author: Carstens Deborah
Converse Patrick D
Fiore Stephen M
Gurbuz Sabri
Kepuska Veton Z
Metcalf David
Rodriguez Walter
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2008
Field of study

A human-centered computing platform that improves teamwork and transforms the “human- computer interaction experience” for distributed teams is presented. This Ubiquitous Collaboration, or uC (“you see”), platform\u27s objective is to transform distributed teamwork (i.e., work occurring when teams of workers and learners are geographically dispersed and often interacting at different times). It achieves this goal through a multimodal team interaction interface realized through a reconfigurable open architecture. The approach taken is to integrate: (1) an intuitive speech- and video-centric multi-modal interface to augment more conventional methods (e.g., mouse, stylus and touch), (2) an open and reconfigurable architecture supporting information gathering, and (3) a machine intelligent approach to analysis and management of heterogeneous live and stored sensor data to support collaboration. The system will transform how teams of people interact with computers by drawing on both the virtual and physical environment

CSUSB ScholarWorks

Generative Pretraining in Multimodality

Author: Cui Yufeng
Gao Hongcheng
Huang Tiejun
Liu Jingjing
Sun Quan
Wang Xinlong
Wang Yueze
Yu Qiying
Zhang Fan
Zhang Xiaosong
Publication venue
Publication date: 11/07/2023
Field of study

We present Emu, a Transformer-based multimodal foundation model, which can seamlessly generate images and texts in multimodal context. This omnivore model can take in any single-modality or multimodal data input indiscriminately (e.g., interleaved image, text and video) through a one-model-for-all autoregressive training process. First, visual signals are encoded into embeddings, and together with text tokens form an interleaved input sequence. Emu is then end-to-end trained with a unified objective of classifying the next text token or regressing the next visual embedding in the multimodal sequence. This versatile multimodality empowers the exploration of diverse pretraining data sources at scale, such as videos with interleaved frames and text, webpages with interleaved images and text, as well as web-scale image-text pairs and video-text pairs. Emu can serve as a generalist multimodal interface for both image-to-text and text-to-image tasks, and supports in-context image and text generation. Across a broad range of zero-shot/few-shot tasks including image captioning, visual question answering, video question answering and text-to-image generation, Emu demonstrates superb performance compared to state-of-the-art large multimodal models. Extended capabilities such as multimodal assistants via instruction tuning are also demonstrated with impressive performance.Comment: Code and Demo: https://github.com/baaivision/Em

arXiv.org e-Print Archive

Using X+V to construct a non-proprietary speech browser for a public-domain SpeechWeb

Author: Ma Xiaoli
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2006
Field of study

A SpeechWeb is a collection of hyperlinked speech applications that are distributed over the Internet. Users access the speech applications through remote browsers, which accept human-voice-input and return synthesized-voice-output. In previous research, a new architecture (LRRP) has been proposed, which is ideally suited for building a Public-Domain SpeechWeb. However, a non-proprietary speech browser is needed for this architecture. In this thesis, we have solved several limitations of X+V, a programming language for developing Multimodal applications, and we have used X+V to build a viable Public-Domain SpeechWeb browser. Our browser has the following properties: real-time human-machine speech interaction; ease of installation and use; acceptable speech-recognition accuracy in a suitable environment; no cost, non-proprietary, ease of distribution; use of common communication protocol---CGI; ease of creation of speech applications; possibility to deploy on mobile devices.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2006 .M31. Source: Masters Abstracts International, Volume: 45-01, page: 0360. Thesis (M.Sc.)--University of Windsor (Canada), 2006

Scholarship at UWindsor

Punny Captions: Witty Wordplay in Image Descriptions

Author: Bansal Mohit
Chandrasekaran Arjun
Parikh Devi
Publication venue
Publication date: 01/01/2018
Field of study

Wit is a form of rich interaction that is often grounded in a specific situation (e.g., a comment in response to an event). In this work, we attempt to build computational models that can produce witty descriptions for a given image. Inspired by a cognitive account of humor appreciation, we employ linguistic wordplay, specifically puns, in image descriptions. We develop two approaches which involve retrieving witty descriptions for a given image from a large corpus of sentences, or generating them via an encoder-decoder neural network architecture. We compare our approach against meaningful baseline approaches via human studies and show substantial improvements. We find that when a human is subject to similar constraints as the model regarding word usage and style, people vote the image descriptions generated by our model to be slightly wittier than human-written witty descriptions. Unsurprisingly, humans are almost always wittier than the model when they are free to choose the vocabulary, style, etc.Comment: NAACL 2018 (11 pages

arXiv.org e-Print Archive

Crossref

Leveraging Large Language Models in Conversational Recommender Systems

Author: Ahuja Sameer
Allen David
Chen Zexi
Chu Brian
Friedman Luke
Lara Harsh
Long Changbo
Patel Ajay
Schubiner Gabriel
Sidahmed Hakim
Tan Zhenning
Tiwari Manoj
Xie Jun
Publication venue
Publication date: 16/05/2023
Field of study

A Conversational Recommender System (CRS) offers increased transparency and control to users by enabling them to engage with the system through a real-time multi-turn dialogue. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to converse naturally and incorporate world knowledge and common-sense reasoning into language understanding, unlocking the potential of this paradigm. However, effectively leveraging LLMs within a CRS introduces new technical challenges, including properly understanding and controlling a complex conversation and retrieving from external sources of information. These issues are exacerbated by a large, evolving item corpus and a lack of conversational data for training. In this paper, we provide a roadmap for building an end-to-end large-scale CRS using LLMs. In particular, we propose new implementations for user preference understanding, flexible dialogue management and explainable recommendations as part of an integrated architecture powered by LLMs. For improved personalization, we describe how an LLM can consume interpretable natural language user profiles and use them to modulate session-level context. To overcome conversational data limitations in the absence of an existing production CRS, we propose techniques for building a controllable LLM-based user simulator to generate synthetic conversations. As a proof of concept we introduce RecLLM, a large-scale CRS for YouTube videos built on LaMDA, and demonstrate its fluency and diverse functionality through some illustrative example conversations

arXiv.org e-Print Archive

Automatic translation of formal data specifications to voice data-input applications.

Author: Hanna Fadi
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2006
Field of study

This thesis introduces a complete solution for automatic translation of formal data specifications to voice data-input applications. The objective of the research is to automatically generate applications for inputting data through speech from specifications of the structure of the data. The formal data specifications are XML DTDs. A new formalization called Grammar-DTD (G-DTD) is introduced as an extended DTD that contains grammars to describe valid values of the DTD elements and attributes. G-DTDs facilitate the automatic generation of Voice XML applications that correspond to the original DTD structure. The development of the automatic application-generator included identifying constraints on the G-DTD to ensure a feasible translation, using predicate calculus to build a knowledge base of inference rules that describes the mapping procedure, and writing an algorithm for the automatic translation based on the inference rules.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2006 .H355. Source: Masters Abstracts International, Volume: 45-01, page: 0354. Thesis (M.Sc.)--University of Windsor (Canada), 2006

Scholarship at UWindsor

Convo: What does conversational programming need? An exploration of machine learning interface design

Author: Lin Phoebe
Van Brummelen Jessica
Weng Kevin
Yeo Catherine
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/03/2020
Field of study

Vast improvements in natural language understanding and speech recognition have paved the way for conversational interaction with computers. While conversational agents have often been used for short goal-oriented dialog, we know little about agents for developing computer programs. To explore the utility of natural language for programming, we conducted a study (

n

=45) comparing different input methods to a conversational programming system we developed. Participants completed novice and advanced tasks using voice-based, text-based, and voice-or-text-based systems. We found that users appreciated aspects of each system (e.g., voice-input efficiency, text-input precision) and that novice users were more optimistic about programming using voice-input than advanced users. Our results show that future conversational programming tools should be tailored to users' programming experience and allow users to choose their preferred input mode. To reduce cognitive load, future interfaces can incorporate visualizations and possess custom natural language understanding and speech recognition models for programming.Comment: 9 pages, 7 figures, submitted to VL/HCC 2020, for associated user study video: https://youtu.be/TC5P3OO5ex

arXiv.org e-Print Archive

Crossref

Adapting the use of attributes to the task environment in joint action: results and a model

Author: Bard Ellen
Guhe Markus
Publication venue
Publication date: 01/06/2008
Field of study

Edinburgh Research Explorer