When humans communicate, we use deictic expressions to refer to objects in our surrounding and put them in the context of our actions. In face to face interaction, we can complement verbal expressions with gestures and, hence, we do not need to be too precise in our verbal protocols. Our interlocutors hear our speaking; see our gestures and they even read our eyes. They interpret our deictic expressions, try to identify the referents and – normally – they will understand. If only machines could do alike. The driving vision behind the research in this thesis are multimodal conversational interfaces where humans are engaged in natural dialogues with computer systems. The embodied conversational agent Max developed in the A.I. group at Bielefeld University is an example of such an interface. Max is already able to produce multimodal deictic expressions using speech, gaze and gestures, but his capabilities to understand humans are not on par. If he was able to resolve multimodal deictic expressions, his understanding of humans would increase and interacting with him would become more natural
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.