1,170 research outputs found
Modality-Balanced Models for Visual Dialogue
The Visual Dialog task requires a model to exploit both image and
conversational context information to generate the next response to the
dialogue. However, via manual analysis, we find that a large number of
conversational questions can be answered by only looking at the image without
any access to the context history, while others still need the conversation
context to predict the correct answers. We demonstrate that due to this reason,
previous joint-modality (history and image) models over-rely on and are more
prone to memorizing the dialogue history (e.g., by extracting certain keywords
or patterns in the context information), whereas image-only models are more
generalizable (because they cannot memorize or extract keywords from history)
and perform substantially better at the primary normalized discounted
cumulative gain (NDCG) task metric which allows multiple correct answers.
Hence, this observation encourages us to explicitly maintain two models, i.e.,
an image-only model and an image-history joint model, and combine their
complementary abilities for a more balanced multimodal model. We present
multiple methods for this integration of the two models, via ensemble and
consensus dropout fusion with shared parameters. Empirically, our models
achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and
high balance across metrics), and substantially outperform the winner of the
Visual Dialog challenge 2018 on most metrics.Comment: AAAI 2020 (11 pages
A network model of interpersonal alignment in dialog
In dyadic communication, both interlocutors adapt to each other linguistically, that is, they align interpersonally. In this article, we develop a framework for modeling interpersonal alignment in terms of the structural similarity of the interlocutors’ dialog lexica. This is done by means of so-called two-layer time-aligned network series, that is, a time-adjusted graph model. The graph model is partitioned into two layers, so that the interlocutors’ lexica are captured as subgraphs of an encompassing dialog graph. Each constituent network of the series is updated utterance-wise. Thus, both the inherent bipartition of dyadic conversations and their gradual development are modeled. The notion of alignment is then operationalized within a quantitative model of structure formation based on the mutual information of the subgraphs that represent the interlocutor’s dialog lexica. By adapting and further developing several models of complex network theory, we show that dialog lexica evolve as a novel class of graphs that have not been considered before in the area of complex (linguistic) networks. Additionally, we show that our framework allows for classifying dialogs according to their alignment status. To the best of our knowledge, this is the first approach to measuring alignment in communication that explores the similarities of graph-like cognitive representations. Keywords: alignment in communication; structural coupling; linguistic networks; graph distance measures; mutual information of graphs; quantitative network analysi
ISAR: Ein Autorensystem für Interaktive Tische
Developing augmented reality systems involves several challenges, that prevent end users and experts from non-technical domains, such as education, to experiment with this technology. In this research we introduce ISAR, an authoring system for augmented reality tabletops targeting users from non-technical domains. ISAR allows non-technical users to create their own interactive tabletop applications and experiment with the use of this technology in domains such as educations, industrial training, and medical rehabilitation.Die Entwicklung von Augmented-Reality-Systemen ist mit mehreren Herausforderungen verbunden, die Endbenutzer und Experten aus nicht-technischen Bereichen, wie z.B. dem Bildungswesen, daran hindern, mit dieser Technologie zu experimentieren. In dieser Forschung stellen wir ISAR vor, ein Autorensystem für Augmented-Reality-Tabletops, das sich an Benutzer aus nicht-technischen Bereichen richtet. ISAR ermöglicht es nicht-technischen Anwendern, ihre eigenen interaktiven Tabletop-Anwendungen zu erstellen und mit dem Einsatz dieser Technologie in Bereichen wie Bildung, industrieller Ausbildung und medizinischer Rehabilitation zu experimentieren
Argumentation dialogues in web-based GDSS: an approach using machine learning techniques
Tese de doutoramento em InformaticsA tomada de decisão está presente no dia a dia de qualquer pessoa, mesmo que muitas vezes ela
não tenha consciência disso. As decisões podem estar relacionadas com problemas quotidianos, ou
podem estar relacionadas com questões mais complexas, como é o caso das questões organizacionais.
Normalmente, no contexto organizacional, as decisões são tomadas em grupo.
Os Sistemas de Apoio à Decisão em Grupo têm sido estudados ao longo das últimas décadas com o
objetivo de melhorar o apoio prestado aos decisores nas mais diversas situações e/ou problemas a resolver.
Existem duas abordagens principais à implementação de Sistemas de Apoio à Decisão em Grupo:
a abordagem clássica, baseada na agregação matemática das preferências dos diferentes elementos do
grupo e as abordagens baseadas na negociação automática (e.g. Teoria dos Jogos, Argumentação, entre
outras).
Os atuais Sistemas de Apoio à Decisão em Grupo baseados em argumentação podem gerar uma
enorme quantidade de dados. O objetivo deste trabalho de investigação é estudar e desenvolver modelos
utilizando técnicas de aprendizagem automática para extrair conhecimento dos diálogos argumentativos
realizados pelos decisores, mais concretamente, pretende-se criar modelos para analisar, classificar e
processar esses dados, potencializando a geração de novo conhecimento que será utilizado tanto por
agentes inteligentes, como por decisiores reais. Promovendo desta forma a obtenção de consenso entre
os membros do grupo. Com base no estudo da literatura e nos desafios em aberto neste domínio,
formulou-se a seguinte hipótese de investigação - É possível usar técnicas de aprendizagem automática
para apoiar diálogos argumentativos em Sistemas de Apoio à Decisão em Grupo baseados na web.
No âmbito dos trabalhos desenvolvidos, foram aplicados algoritmos de classificação supervisionados
a um conjunto de dados contendo argumentos extraídos de debates online, criando um classificador
de frases argumentativas que pode classificar automaticamente (A favor/Contra) frases argumentativas
trocadas no contexto da tomada de decisão. Foi desenvolvido um modelo de clustering dinâmico para
organizar as conversas com base nos argumentos utilizados. Além disso, foi proposto um Sistema de
Apoio à Decisão em Grupo baseado na web que possibilita apoiar grupos de decisores independentemente
de sua localização geográfica. O sistema permite a criação de problemas multicritério e a configuração
das preferências, intenções e interesses de cada decisor. Este sistema de apoio à decisão baseado na
web inclui os dashboards de relatórios inteligentes que são gerados através dos resultados dos trabalhos
alcançados pelos modelos anteriores já referidos. A concretização de cada um dos objetivos permitiu
validar as questões de investigação identificadas e assim responder positivamente à hipótese definida.Decision-making is present in anyone’s daily life, even if they are often unaware of it. Decisions can be
related to everyday problems, or they can be related to more complex issues, such as organizational
issues. Normally, in the organizational context, decisions are made in groups.
Group Decision Support Systems have been studied over the past decades with the aim of improving
the support provided to decision-makers in the most diverse situations and/or problems to be solved.
There are two main approaches to implementing Group Decision Support Systems: the classical approach,
based on the mathematical aggregation of the preferences of the different elements of the group, and the
approaches based on automatic negotiation (e.g. Game Theory, Argumentation, among others).
Current argumentation-based Group Decision Support Systems can generate an enormous amount
of data. The objective of this research work is to study and develop models using automatic learning techniques
to extract knowledge from argumentative dialogues carried out by decision-makers, more specifically,
it is intended to create models to analyze, classify and process these data, enhancing the generation
of new knowledge that will be used both by intelligent agents and by real decision-makers. Promoting in
this way the achievement of consensus among the members of the group. Based on the literature study
and the open challenges in this domain, the following research hypothesis was formulated - It is possible
to use machine learning techniques to support argumentative dialogues in web-based Group Decision
Support Systems.
As part of the work developed, supervised classification algorithms were applied to a data set containing
arguments extracted from online debates, creating an argumentative sentence classifier that can
automatically classify (For/Against) argumentative sentences exchanged in the context of decision-making.
A dynamic clustering model was developed to organize conversations based on the arguments used. In
addition, a web-based Group Decision Support System was proposed that makes it possible to support
groups of decision-makers regardless of their geographic location. The system allows the creation of multicriteria
problems and the configuration of preferences, intentions, and interests of each decision-maker.
This web-based decision support system includes dashboards of intelligent reports that are generated
through the results of the work achieved by the previous models already mentioned. The achievement of
each objective allowed validation of the identified research questions and thus responded positively to the
defined hypothesis.I also thank to Fundação para a Ciência e a Tecnologia, for the Ph.D. grant funding with the reference: SFRH/BD/137150/2018
STICKERCONV: Generating Multimodal Empathetic Responses from Scratch
Stickers, while widely recognized for enhancing empathetic communication in
online interactions, remain underexplored in current empathetic dialogue
research, notably due to the challenge of a lack of comprehensive datasets. In
this paper, we introduce the Agent for STICKERCONV (Agent4SC), which uses
collaborative agent interactions to realistically simulate human behavior with
sticker usage, thereby enhancing multimodal empathetic communication. Building
on this foundation, we develop a multimodal empathetic dialogue dataset,
STICKERCONV, comprising 12.9K dialogue sessions, 5.8K unique stickers, and 2K
diverse conversational scenarios. This dataset serves as a benchmark for
multimodal empathetic generation. To advance further, we propose PErceive and
Generate Stickers (PEGS), a multimodal empathetic response generation
framework, complemented by a comprehensive set of empathy evaluation metrics
based on LLM. Our experiments demonstrate PEGS's effectiveness in generating
contextually relevant and emotionally resonant multimodal empathetic responses,
contributing to the advancement of more nuanced and engaging empathetic
dialogue systems
Collaborative geographic visualization
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de
Lisboa para a obtenção do grau de Mestre em Engenharia do Ambiente, perfil Gestão e
Sistemas AmbientaisThe present document is a revision of essential references to take into account when developing ubiquitous Geographical Information Systems (GIS) with collaborative
visualization purposes.
Its chapters focus, respectively, on general principles of GIS, its multimedia components and ubiquitous practices; geo-referenced information visualization and its graphical components of virtual and augmented reality; collaborative environments, its technological requirements, architectural specificities, and models for collective information management; and some final considerations about the future and challenges of collaborative visualization of GIS in ubiquitous environment
Recommended from our members
Continually improving grounded natural language understanding through human-robot dialog
As robots become ubiquitous in homes and workplaces such as hospitals and factories, they must be able to communicate with humans. Several kinds of knowledge are required to understand and respond to a human's natural language commands and questions. If a person requests an assistant robot to take me to Alice's office, the robot must know that Alice is a person who owns some unique office, and that take me means it should navigate there. Similarly, if a person requests bring me the heavy, green mug, the robot must have accurate mental models of the physical concepts heavy, green, and mug. To avoid forcing humans to use key phrases or words robots already know, this thesis focuses on helping robots understanding new language constructs through interactions with humans and with the world around them. To understand a command in natural language, a robot must first convert that command to an internal representation that it can reason with. Semantic parsing is a method for performing this conversion, and the target representation is often semantic forms represented as predicate logic with lambda calculus. Traditional semantic parsing relies on hand-crafted resources from a human expert: an ontology of concepts, a lexicon connecting language to those concepts, and training examples of language with abstract meanings. One thrust of this thesis is to perform semantic parsing with sparse initial data. We use the conversations between a robot and human users to induce pairs of natural language utterances with the target semantic forms a robot discovers through its questions, reducing the annotation effort of creating training examples for parsing. We use this data to build more dialog-capable robots in new domains with much less expert human effort (Thomason et al., 2015; Padmakumar et al., 2017). Meanings of many language concepts are bound to the physical world. Understanding object properties and categories, such as heavy, green, and mug requires interacting with and perceiving the physical world. Embodied robots can use manipulation capabilities, such as pushing, picking up, and dropping objects to gather sensory data about them. This data can be used to understand non-visual concepts like heavy and empty (e.g. get the empty carton of milk from the fridge), and assist with concepts that have both visual and non-visual expression (e.g. tall things look big and also exert force sooner than short things when pressed down on). A second thrust of this thesis focuses on strategies for learning these concepts using multi-modal sensory information. We use human-in-the-loop learning to get labels between concept words and actual objects in the environment (Thomason et al., 2016, 2017). We also explore ways to tease out polysemy and synonymy in concept words (Thomason and Mooney, 2017) such as light, which can refer to a weight or a color, the latter sense being synonymous with pale. Additionally, pushing, picking up, and dropping objects to gather sensory information is prohibitively time-consuming, so we investigate strategies for using linguistic information and human input to expedite exploration when learning a new concept (Thomason et al., 2018). Finally, we build an integrated agent with both parsing and perception capabilities that learns from conversations with users to improve both components over time. We demonstrate that parser learning from conversations (Thomason et al., 2015) can be combined with multi-modal perception (Thomason et al., 2016) using predicate-object labels gathered through opportunistic active learning (Thomason et al., 2017) during those conversations to improve performance for understanding natural language commands from humans. Human users also qualitatively rate this integrated learning agent as more usable after it has improved from conversation-based learning.Computer Science
- …