Search CORE

103 research outputs found

From Verbs to Tasks: An Integrated Account of Learning Tasks from Situated Interactive Instruction.

Author: Mohan Shiwali
Publication venue
Publication date: 01/01/2015
Field of study

Intelligent collaborative agents are becoming common in the human society. From virtual assistants such as Siri and Google Now to assistive robots, they contribute to human activities in a variety of ways. As they become more pervasive, the challenge of customizing them to a variety of environments and tasks becomes critical. It is infeasible for engineers to program them for each individual use. Our research aims at building interactive robots and agents that adapt to new environments autonomously by interacting with human users using natural modalities. This dissertation studies the problem of learning novel tasks from human-agent dialog. We propose a novel approach for interactive task learning, situated interactive instruction (SII), and investigate approaches to three computational challenges that arise in designing SII agents: situated comprehension, mixed-initiative interaction, and interactive task learning. We propose a novel mixed-modality grounded representation for task verbs which encompasses their lexical, semantic, and task-oriented aspects. This representation is useful in situated comprehension and can be learned through human-agent interactions. We introduce the Indexical Model of comprehension that can exploit extra-linguistic contexts for resolving semantic ambiguities in situated comprehension of task commands. The Indexical model is integrated with a mixed-initiative interaction model that facilitates a flexible task-oriented human-agent dialog. This dialog serves as the basis of interactive task learning. We propose an interactive variation of explanation-based learning that can acquire the proposed representation. We demonstrate that our learning paradigm is efficient, can transfer knowledge between structurally similar tasks, integrates agent-driven exploration with instructional learning, and can acquire several tasks. The methods proposed in this thesis are integrated in Rosie - a generally instructable agent developed in the Soar cognitive architecture and embodied on a table-top robot.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111573/1/shiwali_1.pd

Deep Blue Documents at the University of Michigan

Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

Author
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Tilburg University Repository

SEMDIAL 2013 : DialDam:Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue : Amsterdam, 16-18 December 2013

Author
Publication venue: University of Amsterdam
Publication date: 01/01/2013
Field of study

International Migration, Integration and Social Cohesion online publications

The significance of silence. Long gaps attenuate the preference for ‘yes’ responses in conversation.

Author: Bögels S.
Kendrick K.
Levinson S.
Publication venue
Publication date
Field of study

In conversation, negative responses to invitations, requests, offers and the like more often occur with a delay – conversation analysts talk of them as dispreferred. Here we examine the contrastive cognitive load ‘yes’ and ‘no’ responses make, either when given relatively fast (300 ms) or delayed (1000 ms). Participants heard minidialogues, with turns extracted from a spoken corpus, while having their EEG recorded. We find that a fast ‘no’ evokes an N400-effect relative to a fast ‘yes’, however this contrast is not present for delayed responses. This shows that an immediate response is expected to be positive – but this expectation disappears as the response time lengthens because now in ordinary conversation the probability of a ‘no’ has increased. Additionally, however, 'No' responses elicit a late frontal positivity both when they are fast and when they are delayed. Thus, regardless of the latency of response, a ‘no’ response is associated with a late positivity, since a negative response is always dispreferred and may require an account. Together these results show that negative responses to social actions exact a higher cognitive load, but especially when least expected, as an immediate response

MPG.PuRe

Recommended from our members

Continually improving grounded natural language understanding through human-robot dialog

Author: Thomason Jesse David
Publication venue
Publication date: 23/08/2018
Field of study

As robots become ubiquitous in homes and workplaces such as hospitals and factories, they must be able to communicate with humans. Several kinds of knowledge are required to understand and respond to a human's natural language commands and questions. If a person requests an assistant robot to take me to Alice's office, the robot must know that Alice is a person who owns some unique office, and that take me means it should navigate there. Similarly, if a person requests bring me the heavy, green mug, the robot must have accurate mental models of the physical concepts heavy, green, and mug. To avoid forcing humans to use key phrases or words robots already know, this thesis focuses on helping robots understanding new language constructs through interactions with humans and with the world around them. To understand a command in natural language, a robot must first convert that command to an internal representation that it can reason with. Semantic parsing is a method for performing this conversion, and the target representation is often semantic forms represented as predicate logic with lambda calculus. Traditional semantic parsing relies on hand-crafted resources from a human expert: an ontology of concepts, a lexicon connecting language to those concepts, and training examples of language with abstract meanings. One thrust of this thesis is to perform semantic parsing with sparse initial data. We use the conversations between a robot and human users to induce pairs of natural language utterances with the target semantic forms a robot discovers through its questions, reducing the annotation effort of creating training examples for parsing. We use this data to build more dialog-capable robots in new domains with much less expert human effort (Thomason et al., 2015; Padmakumar et al., 2017). Meanings of many language concepts are bound to the physical world. Understanding object properties and categories, such as heavy, green, and mug requires interacting with and perceiving the physical world. Embodied robots can use manipulation capabilities, such as pushing, picking up, and dropping objects to gather sensory data about them. This data can be used to understand non-visual concepts like heavy and empty (e.g. get the empty carton of milk from the fridge), and assist with concepts that have both visual and non-visual expression (e.g. tall things look big and also exert force sooner than short things when pressed down on). A second thrust of this thesis focuses on strategies for learning these concepts using multi-modal sensory information. We use human-in-the-loop learning to get labels between concept words and actual objects in the environment (Thomason et al., 2016, 2017). We also explore ways to tease out polysemy and synonymy in concept words (Thomason and Mooney, 2017) such as light, which can refer to a weight or a color, the latter sense being synonymous with pale. Additionally, pushing, picking up, and dropping objects to gather sensory information is prohibitively time-consuming, so we investigate strategies for using linguistic information and human input to expedite exploration when learning a new concept (Thomason et al., 2018). Finally, we build an integrated agent with both parsing and perception capabilities that learns from conversations with users to improve both components over time. We demonstrate that parser learning from conversations (Thomason et al., 2015) can be combined with multi-modal perception (Thomason et al., 2016) using predicate-object labels gathered through opportunistic active learning (Thomason et al., 2017) during those conversations to improve performance for understanding natural language commands from humans. Human users also qualitatively rate this integrated learning agent as more usable after it has improved from conversation-based learning.Computer Science

Texas ScholarWorks

A Survey on Semantic Processing Techniques

Author: Cambria Erik
Chen Guanyi
He Kai
Mao Rui
Ni Jinjie
Yang Zonglin
Zhang Xulang
Publication venue
Publication date: 22/10/2023
Field of study

Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

arXiv.org e-Print Archive

Modelling Incremental Self-Repair Processing in Dialogue.

Author: Hough Julian
Publication venue: 'Queen Mary University of London'
Publication date: 01/11/2014
Field of study

PhDSelf-repairs, where speakers repeat themselves, reformulate or restart what they are saying, are pervasive in human dialogue. These phenomena provide a window into real-time human language processing. For explanatory adequacy, a model of dialogue must include mechanisms that account for them. Artificial dialogue agents also need this capability for more natural interaction with human users. This thesis investigates the structure of self-repair and its function in the incremental construction of meaning in interaction. A corpus study shows how the range of self-repairs seen in dialogue cannot be accounted for by looking at surface form alone. More particularly it analyses a string-alignment approach and shows how it is insufficient, provides requirements for a suitable model of incremental context and an ontology of self-repair function. An information-theoretic model is developed which addresses these issues along with a system that automatically detects self-repairs and edit terms on transcripts incrementally with minimal latency, achieving state-of-the-art results. Additionally it is shown to have practical use in the psychiatric domain. The thesis goes on to present a dialogue model to interpret and generate repaired utterances incrementally. When processing repaired rather than fluent utterances, it achieves the same degree of incremental interpretation and incremental representation. Practical implementation methods are presented for an existing dialogue system. Finally, a more pragmatically oriented approach is presented to model self-repairs in a psycholinguistically plausible way. This is achieved through extending the dialogue model to include a probabilistic semantic framework to perform incremental inference in a reference resolution domain. The thesis concludes that at least as fine-grained a model of context as word-by-word is required for realistic models of self-repair, and context must include linguistic action sequences and information update effects. The way dialogue participants process self-repairs to make inferences in real time, rather than filter out their disfluency effects, has been modelled formally and in practical systems.Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Account (DTA) scholarship from the School of Electronic Engineering and Computer Science at Queen Mary University of London

Queen Mary Research Online

Exploring Speech Technologies for Language Learning

Author: DELMONTE R.
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

The teaching of the pronunciation of any foreign language must encompass both segmental and suprasegmental aspects of speech. In computational terms, the two levels of language learning activities can be decomposed at least into phonemic aspects, which include the correct pronunciation of single phonemes and the co-articulation of phonemes into higher phonological units; as well as prosodic aspects which include  the correct position of stress at word level;  the alternation of stress and unstressed syllables in terms of compensation and vowel reduction;  the correct position of sentence accent;  the generation of the adequate rhymth from the interleaving of stress, accent, and phonological rules;  the generation of adequate intonational pattern for each utterance related to communicative functions; As appears from above, for a student to communicate intelligibly and as close as possible to native-speaker's pronunciation, prosody is very important [3]. We also assume that an incorrect prosody may hamper communication from taking place and this may be regarded a strong motivation for having the teaching of Prosody as an integral part of any language course. From our point of view it is much more important to stress the achievement of successful communication as the main objective of a second language learner rather than the overcoming of what has been termed “foreign accent”, which can be deemed as a secondary goal. In any case, the two goals are certainly not coincident even though they may be overlapping in some cases. We will discuss about these matter in the following sections. All prosodic questions related to “rhythm” will be discussed in the first section of this chapter. In [4] the author argues in favour of prosodic aids, in particular because a strong placement of word stress may impair understanding from the listener’s point of view of the word being pronounced. He also argues in favour of acquiring correct timing of phonological units to overcome the impression of “foreign accent” which may ensue from an incorrect distribution of stressed vs. unstressed stretches of linguistic units such as syllables or metric feet. Timing is not to be confused with speaking rate which need not be increased forcefully to give the impression of a good fluency: trying to increase speaking rate may result in lower intelligibility. The question of “foreign accent” is also discussed at length in (Jilka M., 1999). This work is particularly relevant as far as intonational features of a learner of a second language which we will address in the second section of this chapter. Correcting the Intonational Foreign Accent (hence IFA) is an important component of a Prosodic Module for self-learning activities, as categorical aspects of the intonation of the two languages in contact, L1 and L2 are far apart and thus neatly distinguishable. Choice of the two languages in contact is determined mainly by the fact that the distance in prosodic terms between English and Italian is maximal, according to (Ramus, F. and J. Mehler, 1999; Ramus F., et al., 1999)

Archivio Ricerca Ca'Foscari

IntechOpen

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Prosodic tools for language learning

Author: A. Batliner
D. Klatt
D. M. Chun
D. Pisoni
D. S. Hurley
E. Shriberg
F. Ramus
F. Ramus
I. Lehiste
J. Bernstein
J. D. Bowen
L. Loveday
M. Eskénazi
M. J. Luthy
N. Umeda
O. R. Kelm
P. M. Bertinetto
R. Delmonte
R. Delmonte
R. Delmonte
Rodolfo Delmonte
S. Hiller
W. Campbell
Y. Medan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

In this paper we will be concerned with the role played by prosody in language learning and by the speech technology already available as commercial product or as prototype, capable to cope with the task of helping language learner in improving their knowledge of a second language from the prosodic point of view. The paper has been divided into two separate sections: Section One, dealing with Rhythm and all related topics; Section Two dealing with Intonation. In the Introduction we will argue that the use of ASR (Automatic Speech Recognition) as Teaching Aid should be under-utilized and should be targeted to narrowly focussed spoken exercises, disallowing open-ended dialogues, in order to ensure consistency of evaluation. Eventually, we will support the conjoined use of ASR technology and prosodic tools to produce GOP useable for linguistically consistent and adequate feedback to the student. This will be illustrated by presenting State of the Art for both sections, with systems well documented in the scientific literature of the respective field. In order to discuss the scientific foundations of prosodic analysis we will present data related to English and Italian and make comparisons to clarify the issues at hand. In this context, we will also present the Prosodic Module of a courseware for computer-assisted foreign language learning called SLIM—an acronym for Multimedia Interactive Linguistic Software, developed at the University of Venice (Delmonte et al. in Convegno GFS-AIA, pp. 47–58, 1996a; Ed-Media 96, AACE, pp. 326–333, 1996b). The Prosodic Module has been created in order to deal with the problem of improving a student’s performance both in the perception and production of prosodic aspects of spoken language activities. It is composed of two different sets of Learning Activities, the first one dealing with phonetic and prosodic problems at word level and at syllable level; the second one dealing with prosodic aspects at phonological phrase and utterance suprasegmental level. The main goal of Prosodic Activities is to ensure consistent and pedagogically sound feedback to the student intending to improve his/her pronunciation in a foreign language

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Learning to Interpret and Apply Multimodal Descriptions

Author: Han Ting
Publication venue: Universität Bielefeld
Publication date: 01/01/2018
Field of study

Han T. Learning to Interpret and Apply Multimodal Descriptions. Bielefeld: Universität Bielefeld; 2018.Enabling computers to understand natural human communication is a goal researchers have been long aspired to in artificial intelligence. Since the concept demonstration of “Put-That- There” in 1980s, significant achievements have been made in developing multimodal interfaces that can process human communication such as speech, eye gaze, facial emotion, co-verbal hand gestures and pen input. State-of-the-art multimodal interfaces are able to process pointing gestures, symbolic gestures with conventional meanings, as well as gesture commands with pre-defined meanings (e.g., circling for “select”). However, in natural communication, co- verbal gestures/pen input rarely convey meanings via conventions or pre-defined rules, but embody meanings relatable to the accompanying speech. For example, in route given tasks, people often describe landmarks verbally (e.g., two buildings), while demonstrating the relative position with two hands facing each other in the space. Interestingly, when the same gesture is accompanied by the utterance a ball, it may indicate the size of the ball. Hence, the interpretation of such co-verbal hand gestures largely depends on the accompanied verbal content. Similarly, when describing objects, while verbal utterances are most convenient for describing colour and category (e.g., a brown elephant), hand-drawn sketches are often deployed to convey iconic information such as the exact shape of the elephant’s trunk, which is typically difficult to encode in language. This dissertation concerns the task of learning to interpret multimodal descriptions com- posed of verbal utterances and hand gestures/sketches, and apply corresponding interpretations to tasks such as image retrieval. Specifically, we aim to address following research questions: 1) For co-verbal gestures that embody meanings relatable to accompanied verbal content, how can we use natural language information to interpret the semantics of such co-verbal gestures, e.g., does a gesture indicate relative position or size? 2) As an integral system of commu- nication, speech and gestures not only bear close semantic relations, but also close temporal relations. To what degree and on which dimensions can hand gestures benefit the task of inter- preting multimodal descriptions? 3) While it’s obvious that iconic information in hand-drawn sketches enriches verbal content in object descriptions, how to model the joint contributions of such multimodal descriptions and to what degree can verbal descriptions compensate reduced iconic details in hand-drawn sketches? To address the above questions, we first introduce three multimodal description corpora: a spatial description corpus composed of natural language and placing gestures (also referred as abstract deictics), a multimodal object description corpus composed of natural language and hand-drawn sketches, and an existing corpus - the Bielefeld Speech and Gesture Alignment Corpus (SAGA). 3 4 We frame the problem of learning gesture semantics as a multi-label classification task us- ing natural language information and hand gesture features. We conducted an experiment with the SAGA corpus. The results show that natural language is informative for the interpretation of hand gestures. Further more, we describe a system that models the interpretation and application of spatial descriptions and explored three variants of representation methods of the verbal content. When representing the verbal content in the descriptions with a set of automatically learned symbols, the system’s performance is on par with representations with manually defined symbols (e.g., pre-defined object properties). We show that abstract deictic gestures not only lead to better understanding of spatial descriptions, but also result in earlier correct decisions of the system, which can be used to trigger immediate reactions in dialogue systems. Finally, we investigate the interplay of semantics between symbolic (natural language) and iconic (sketches) modes in multimodal object descriptions, where natural language and sketches jointly contribute to the communications. We model the meaning of natural language and sketches two existing models and combine the meanings from both modalities with a late fusion approach. The results show that even adding reduced sketches (30% of full sketches) can help in the retrieval task. Moreover, in current setup, natural language descriptions can compensate around 30% of reduced sketches

Publications at Bielefeld University