193 research outputs found
A Comparison of Visualisation Methods for Disambiguating Verbal Requests in Human-Robot Interaction
Picking up objects requested by a human user is a common task in human-robot
interaction. When multiple objects match the user's verbal description, the
robot needs to clarify which object the user is referring to before executing
the action. Previous research has focused on perceiving user's multimodal
behaviour to complement verbal commands or minimising the number of follow up
questions to reduce task time. In this paper, we propose a system for reference
disambiguation based on visualisation and compare three methods to disambiguate
natural language instructions. In a controlled experiment with a YuMi robot, we
investigated real-time augmentations of the workspace in three conditions --
mixed reality, augmented reality, and a monitor as the baseline -- using
objective measures such as time and accuracy, and subjective measures like
engagement, immersion, and display interference. Significant differences were
found in accuracy and engagement between the conditions, but no differences
were found in task time. Despite the higher error rates in the mixed reality
condition, participants found that modality more engaging than the other two,
but overall showed preference for the augmented reality condition over the
monitor and mixed reality conditions
Referential precedents in spoken language comprehension: a review and meta-analysis
Listenersâ interpretations of referring expressions are influenced by referential
precedentsâtemporary conventions established in a discourse that associate linguistic
expressions with referents. A number of psycholinguistic studies have investigated how
much precedent effects depend on beliefs about the speakerâs perspective versus more
egocentric, domain-general processes. We review and provide a meta-analysis of
visual-world eyetracking studies of precedent use, focusing on three principal effects: (1) a
same speaker advantage for maintained precedents; (2) a different speaker advantage for
broken precedents; and (3) an overall main effect of precedents. Despite inconsistent claims
in the literature, our combined analysis reveals surprisingly consistent evidence supporting
the existence of all three effects, but with different temporal profiles. These findings carry
important implications for existing theoretical explanations of precedent use, and challenge
explanations based solely on the use of information about speakersâ perspectives
How Do I Address You? Modelling addressing behavior based on an analysis of a multi-modal corpora of conversational discourse
Addressing is a special kind of referring and thus principles of multi-modal referring expression generation will also be basic for generation of address terms and addressing gestures for conversational agents. Addressing is a special kind of referring because of the different (second person instead of object) role that the referent has in the interaction. Based on an analysis of addressing behaviour in multi-party face-to-face conversations (meetings, TV discussions as well as theater plays), we present outlines of a model for generating multi-modal verbal and non-verbal addressing behaviour for agents in multi-party interactions
Improving coreference resolution by using conversational metadata
In this paper, we propose the use of metadata contained in documents to improve coreference resolution. Specifically, we quantify the impact of speaker and turn information on the performance of our coreference system, and show that the metadata can be effectively encoded as features of a statistical resolution system, which leads to a statistically significant improvement in performance.
Deep Linguistic Processing with GETARUNS for Spoken Dialogue Understanding
In this paper we will present work carried out to scale up the system for text understanding called GETARUNS, and port it
to be used in dialogue understanding. The current goal is that of extracting automatically argumentative information in
order to build argumentative structure. The long term goal is using argumentative structure to produce automatic
summarization of spoken dialogues. Very much like other deep linguistic processing systems, our system is a generic
text/dialogue understanding system that can be used in connection with an ontology â WordNet - and other similar
repositories of commonsense knowledge. We will present the adjustments we made in order to cope with transcribed
spoken dialogues like those produced in the ICSI Berkeley project. In a final section we present preliminary evaluation of
the system on two tasks: the task of automatic argumentative labeling and another frequently addressed task: referential vs.
non-referential pronominal detection. Results obtained fair much higher than those reported in similar experiments with
machine learning approaches
Tag disambiguation based on social network information
Within 20 years the Web has grown from a tool for scientists at CERN into a global information space. While returning to its roots as a read/write tool, its entering a more social and participatory phase. Hence a new, improved version called the Social Web where users are responsible for generating and sharing content on the global information space, they are also accountable for replicating the information. This collaborative activity can be observed in two of the most widely practised Social Web services such as social network sites and social tagging systems. Users annotate their interests and inclinations with free form keywords while they share them with their social connections. Although these keywords (tag) assist information organization and retrieval, theysuffer from polysemy.In this study we employ the effectiveness of social network sites to address the issue of ambiguity in social tagging. Moreover, we also propose that homophily in social network sites can be a useful aspect is disambiguating tags. We have extracted the âLikesâ of 20 Facebook users and employ them in disambiguation tags on Flickr. Classifiers are generated on the retrieved clusters from Flickr using K-Nearest-Neighbour algorithm and then their degree of similarity is calculated with user keywords. As tag disambiguation techniques lack gold standards for evaluation, we asked the users to indicate the contexts and used them as ground truth while examining the results. We analyse the performance of our approach by quantitative methods and report successful results. Our proposed method is able classify images with an accuracy of 6 out of 10 (on average). Qualitative analysis reveal some factors that affect the findings, and if addressed can produce more precise results
A Review of Verbal and Non-Verbal Human-Robot Interactive Communication
In this paper, an overview of human-robot interactive communication is
presented, covering verbal as well as non-verbal aspects of human-robot
interaction. Following a historical introduction, and motivation towards fluid
human-robot communication, ten desiderata are proposed, which provide an
organizational axis both of recent as well as of future research on human-robot
communication. Then, the ten desiderata are examined in detail, culminating to
a unifying discussion, and a forward-looking conclusion
- âŠ