5 research outputs found
InproTKs: A Toolkit for Incremental Situated Processing
Kennington C, Kousidis S, Schlangen D. InproTKs: A Toolkit for Incremental Situated Processing. In: Proceedings of SIGdial 2014: Short Papers. 2014: 84-88
Incrementally Tracking Reference in Human/Human Dialogue Using Linguistic and Extra-Linguistic Information
Kennington C, Iida R, Tokunaga T, Schlangen D. Incrementally Tracking Reference in Human/Human Dialogue Using Linguistic and Extra-Linguistic Information. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics â Human Language Technologies (NAACL HLT 2015). Denver, U.S.A.: Association for Computational Linguistics; 2015: 272-282
Are we all disfluent in our own special way and should dialogue systems also be?
Betz S, Lopez Gambino MS. Are we all disfluent in our own special way and should dialogue systems also be? In: Jokisch O, ed. Elektronische Sprachsignalverarbeitung (ESSV) 2016. Studientexte zur Sprachkommunikation. Vol 81. Dresden: TUD Press; 2016: 168-174
Investigating speaker gaze and pointing behaviour in human-computer interaction with the `mint.tools` collection
Kousidis S, Kennington C, Schlangen D. Investigating speaker gaze and pointing behaviour in human-computer interaction with the `mint.tools` collection. In: Proceedings of Short Papers at SIGdial 2013. 2013
Incrementally resolving references in order to identify visually present objects in a situated dialogue setting
Kennington C. Incrementally resolving references in order to identify visually present objects in a situated dialogue setting. Bielefeld: Universität Bielefeld; 2016.The primary concern of this thesis is to model the resolution of spoken referring expressions
made in order to identify objects; in particular, everyday objects that can be perceived visually
and distinctly from other objects. The practical goal of such a model is for it to be implemented
as a component for use in a live, interactive, autonomous spoken dialogue system. The requirement of interaction imposes an added complication; one that has been ignored in previous
models and approaches to automatic reference resolution: the model must attempt to resolve
the reference incrementally as it unfoldsânot wait until the end of the referring expression to
begin the resolution process.
Beyond components in dialogue systems, reference has been a major player in the philosophy of meaning for longer than a century. For example, Gottlob Frege (1892) has distinguished
between Sinn (sense) and Bedeutung (reference), discussed how they are related and how they
relate to the meaning of words and expressions. It has furthermore been argued (e.g., Dahlgren
(1976)) that reference to entities in the actual world is not just a fundamental notion of semantic theory, but the fundamental notion; for an individual acquiring a language, understanding
the meaning of many words and concepts is done via the task of reference, beginning in early
childhood. In this thesis, we pursue an account of word meaning that is based on perception of
objects; for example, the meaning of the word red is based on visual features that are selected
as distinguishing red objects from non-red ones.
This thesis proposes two statistical models of incremental reference resolution. Given ex-
amples of referring expressions and visual aspects of the objects to which those expressions
referred, both model components learn a functional mapping between the words of the refer-
ring expressions and the visual aspects. A generative model, the simple incremental update
model, presented in Chapter 5, uses a mediating variable to learn the mapping, whereas a dis-
criminative model, the words-as-classifiers model, presented in Chapter 6, learns the mapping
directly and improves over the generative model. Both models have been evaluated in various
reference resolution tasks to objects in virtual scenes as well as real, tangible objects. This
thesis shows that both models work robustly and are able to resolve referring expressions made
in reference to visually present objects despite realistic, noisy conditions of speech and object
recognition. A theoretical and practical comparison is also provided.
Special emphasis is given to the discriminative model in this thesis because of its simplicity
and ability to represent word meanings. It is in the learning and application of this model that
gives credence to the above claim that reference is the fundamental notion for semantic theory
and that meanings of (visual) words is done through experiencing referring expressions made
to objects that are visually perceivable