156,769 research outputs found
Large language models in textual analysis for gesture selection
Gestures perform a variety of communicative functions that powerfully
influence human face-to-face interaction. How this communicative function is
achieved varies greatly between individuals and depends on the role of the
speaker and the context of the interaction. Approaches to automatic gesture
generation vary not only in the degree to which they rely on data-driven
techniques but also the degree to which they can produce context and speaker
specific gestures. However, these approaches face two major challenges: The
first is obtaining sufficient training data that is appropriate for the context
and the goal of the application. The second is related to designer control to
realize their specific intent for the application. Here, we approach these
challenges by using large language models (LLMs) to show that these powerful
models of large amounts of data can be adapted for gesture analysis and
generation. Specifically, we used ChatGPT as a tool for suggesting
context-specific gestures that can realize designer intent based on minimal
prompts. We also find that ChatGPT can suggests novel yet appropriate gestures
not present in the minimal training data. The use of LLMs is a promising avenue
for gesture generation that reduce the need for laborious annotations and has
the potential to flexibly and quickly adapt to different designer intents
Reclaiming human machine nature
Extending and modifying his domain of life by artifact production is one of
the main characteristics of humankind. From the first hominid, who used a wood
stick or a stone for extending his upper limbs and augmenting his gesture
strength, to current systems engineers who used technologies for augmenting
human cognition, perception and action, extending human body capabilities
remains a big issue. From more than fifty years cybernetics, computer and
cognitive sciences have imposed only one reductionist model of human machine
systems: cognitive systems. Inspired by philosophy, behaviorist psychology and
the information treatment metaphor, the cognitive system paradigm requires a
function view and a functional analysis in human systems design process.
According that design approach, human have been reduced to his metaphysical and
functional properties in a new dualism. Human body requirements have been left
to physical ergonomics or "physiology". With multidisciplinary convergence, the
issues of "human-machine" systems and "human artifacts" evolve. The loss of
biological and social boundaries between human organisms and interactive and
informational physical artifact questions the current engineering methods and
ergonomic design of cognitive systems. New developpment of human machine
systems for intensive care, human space activities or bio-engineering sytems
requires grounding human systems design on a renewed epistemological framework
for future human systems model and evidence based "bio-engineering". In that
context, reclaiming human factors, augmented human and human machine nature is
a necessityComment: Published in HCI International 2014, Heraklion : Greece (2014
Toward a model of computational attention based on expressive behavior: applications to cultural heritage scenarios
Our project goals consisted in the development of attention-based analysis of human expressive behavior and the implementation of real-time algorithm in EyesWeb XMI in order to improve naturalness of human-computer interaction and context-based monitoring of human behavior. To this aim, perceptual-model that mimic human attentional processes was developed for expressivity analysis and modeled by entropy. Museum scenarios were selected as an ecological test-bed to elaborate three experiments that focus on visitor profiling and visitors flow regulation
Classifying types of gesture and inferring intent
In order to infer intent from gesture, a rudimentary classification of types of gestures into five main classes is introduced. The classification is intended as a basis for incorporating the understanding of gesture into human-robot interaction (HRI). Some requirements for the operational classification of gesture by a robot interacting with humans are also suggested
Down-Sampling coupled to Elastic Kernel Machines for Efficient Recognition of Isolated Gestures
In the field of gestural action recognition, many studies have focused on
dimensionality reduction along the spatial axis, to reduce both the variability
of gestural sequences expressed in the reduced space, and the computational
complexity of their processing. It is noticeable that very few of these methods
have explicitly addressed the dimensionality reduction along the time axis.
This is however a major issue with regard to the use of elastic distances
characterized by a quadratic complexity. To partially fill this apparent gap,
we present in this paper an approach based on temporal down-sampling associated
to elastic kernel machine learning. We experimentally show, on two data sets
that are widely referenced in the domain of human gesture recognition, and very
different in terms of quality of motion capture, that it is possible to
significantly reduce the number of skeleton frames while maintaining a good
recognition rate. The method proves to give satisfactory results at a level
currently reached by state-of-the-art methods on these data sets. The
computational complexity reduction makes this approach eligible for real-time
applications.Comment: ICPR 2014, International Conference on Pattern Recognition, Stockholm
: Sweden (2014
ModDrop: adaptive multi-modal gesture recognition
We present a method for gesture detection and localisation based on
multi-scale and multi-modal deep learning. Each visual modality captures
spatial information at a particular spatial scale (such as motion of the upper
body or a hand), and the whole system operates at three temporal scales. Key to
our technique is a training strategy which exploits: i) careful initialization
of individual modalities; and ii) gradual fusion involving random dropping of
separate channels (dubbed ModDrop) for learning cross-modality correlations
while preserving uniqueness of each modality-specific representation. We
present experiments on the ChaLearn 2014 Looking at People Challenge gesture
recognition track, in which we placed first out of 17 teams. Fusing multiple
modalities at several spatial and temporal scales leads to a significant
increase in recognition rates, allowing the model to compensate for errors of
the individual classifiers as well as noise in the separate channels.
Futhermore, the proposed ModDrop training technique ensures robustness of the
classifier to missing signals in one or several channels to produce meaningful
predictions from any number of available modalities. In addition, we
demonstrate the applicability of the proposed fusion scheme to modalities of
arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure
Mapping Tasks to Interactions for Graph Exploration and Graph Editing on Interactive Surfaces
Graph exploration and editing are still mostly considered independently and
systems to work with are not designed for todays interactive surfaces like
smartphones, tablets or tabletops. When developing a system for those modern
devices that supports both graph exploration and graph editing, it is necessary
to 1) identify what basic tasks need to be supported, 2) what interactions can
be used, and 3) how to map these tasks and interactions. This technical report
provides a list of basic interaction tasks for graph exploration and editing as
a result of an extensive system review. Moreover, different interaction
modalities of interactive surfaces are reviewed according to their interaction
vocabulary and further degrees of freedom that can be used to make interactions
distinguishable are discussed. Beyond the scope of graph exploration and
editing, we provide an approach for finding and evaluating a mapping from tasks
to interactions, that is generally applicable. Thus, this work acts as a
guideline for developing a system for graph exploration and editing that is
specifically designed for interactive surfaces.Comment: 21 pages, minor corrections (typos etc.
- …