317 research outputs found
A Comprehensive Review of Data-Driven Co-Speech Gesture Generation
Gestures that accompany speech are an essential part of natural and efficient
embodied human communication. The automatic generation of such co-speech
gestures is a long-standing problem in computer animation and is considered an
enabling technology in film, games, virtual social spaces, and for interaction
with social robots. The problem is made challenging by the idiosyncratic and
non-periodic nature of human co-speech gesture motion, and by the great
diversity of communicative functions that gestures encompass. Gesture
generation has seen surging interest recently, owing to the emergence of more
and larger datasets of human gesture motion, combined with strides in
deep-learning-based generative models, that benefit from the growing
availability of data. This review article summarizes co-speech gesture
generation research, with a particular focus on deep generative models. First,
we articulate the theory describing human gesticulation and how it complements
speech. Next, we briefly discuss rule-based and classical statistical gesture
synthesis, before delving into deep learning approaches. We employ the choice
of input modalities as an organizing principle, examining systems that generate
gestures from audio, text, and non-linguistic input. We also chronicle the
evolution of the related training data sets in terms of size, diversity, motion
quality, and collection method. Finally, we identify key research challenges in
gesture generation, including data availability and quality; producing
human-like motion; grounding the gesture in the co-occurring speech in
interaction with other speakers, and in the environment; performing gesture
evaluation; and integration of gesture synthesis into applications. We
highlight recent approaches to tackling the various key challenges, as well as
the limitations of these approaches, and point toward areas of future
development.Comment: Accepted for EUROGRAPHICS 202
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
The significance of silence. Long gaps attenuate the preference for ‘yes’ responses in conversation.
In conversation, negative responses to invitations, requests, offers and the like more often occur with a delay – conversation analysts talk of them as dispreferred. Here we examine the contrastive cognitive load ‘yes’ and ‘no’ responses make, either when given relatively fast (300 ms) or delayed (1000 ms). Participants heard minidialogues, with turns extracted from a spoken corpus, while having their EEG recorded. We find that a fast ‘no’ evokes an N400-effect relative to a fast ‘yes’, however this contrast is not present for delayed responses. This shows that an immediate response is expected to be positive – but this expectation disappears as the response time lengthens because now in ordinary conversation the probability of a ‘no’ has increased. Additionally, however, 'No' responses elicit a late frontal positivity both when they are fast and when they are delayed. Thus, regardless of the latency of response, a ‘no’ response is associated with a late positivity, since a negative response is always dispreferred and may require an account. Together these results show that negative responses to social actions exact a higher cognitive load, but especially when least expected, as an immediate response
Relational knowledge and representation for reinforcement learning
In reinforcement learning, an agent interacts with the environment, learns from feedback about the quality of its actions, and improves its behaviour or policy in order
to maximise its expected utility. Learning efficiently in large scale problems is a major challenge. State aggregation is possible in problems with a first-order structure,
allowing the agent to learn in an abstraction of the original problem which is of
considerably smaller scale. One approach is to learn the Q-values of actions which
are approximated by a relational function approximator. This is the basis for relational reinforcement learning (RRL). We abstract the state with first-order features
which consist of only variables, thereby aggregating similar states from all problems
of the same domain to abstract states. We study the limitations of RRL due to
this abstraction and introduce the concepts of consistent abstraction, subsumption
of problems, and abstract-equivalent problems. We propose three methods to overcome the limitations, extending the types of problems our RRL method can solve.
Next, to further improve the learning efficiency, we propose to learn different types
of generalised knowledge. The policy is influenced by directed exploration based on
multiple types of intrinsic rewards and avoids previously encountered dead ends. In
addition, we incorporate model-based techniques to provide better quality estimates
of the Q-values. Transfer learning is possible by directly leveraging the generalised
knowledge to accelerate learning in a new problem. Lastly, we introduce a new class
of problems which considers dynamic objects and time-bounded goals. We discuss
the complications these bring to RRL and present some solutions. We also propose a framework for multi-agent coordination to achieve joint goals represented by
time-bounded goals by decomposing a multi-agent problem into single-agent problems. We evaluate our work empirically in six domains to demonstrate its efficacy
in solving large scale problems and transfer learning
- …