886 research outputs found
PsyMo: A Dataset for Estimating Self-Reported Psychological Traits from Gait
Psychological trait estimation from external factors such as movement and
appearance is a challenging and long-standing problem in psychology, and is
principally based on the psychological theory of embodiment. To date, attempts
to tackle this problem have utilized private small-scale datasets with
intrusive body-attached sensors. Potential applications of an automated system
for psychological trait estimation include estimation of occupational fatigue
and psychology, and marketing and advertisement. In this work, we propose PsyMo
(Psychological traits from Motion), a novel, multi-purpose and multi-modal
dataset for exploring psychological cues manifested in walking patterns. We
gathered walking sequences from 312 subjects in 7 different walking variations
and 6 camera angles. In conjunction with walking sequences, participants filled
in 6 psychological questionnaires, totalling 17 psychometric attributes related
to personality, self-esteem, fatigue, aggressiveness and mental health. We
propose two evaluation protocols for psychological trait estimation. Alongside
the estimation of self-reported psychological traits from gait, the dataset can
be used as a drop-in replacement to benchmark methods for gait recognition. We
anonymize all cues related to the identity of the subjects and publicly release
only silhouettes, 2D / 3D human skeletons and 3D SMPL human meshes
Expanded Parts Model for Semantic Description of Humans in Still Images
We introduce an Expanded Parts Model (EPM) for recognizing human attributes
(e.g. young, short hair, wearing suit) and actions (e.g. running, jumping) in
still images. An EPM is a collection of part templates which are learnt
discriminatively to explain specific scale-space regions in the images (in
human centric coordinates). This is in contrast to current models which consist
of a relatively few (i.e. a mixture of) 'average' templates. EPM uses only a
subset of the parts to score an image and scores the image sparsely in space,
i.e. it ignores redundant and random background in an image. To learn our
model, we propose an algorithm which automatically mines parts and learns
corresponding discriminative templates together with their respective locations
from a large number of candidate parts. We validate our method on three recent
challenging datasets of human attributes and actions. We obtain convincing
qualitative and state-of-the-art quantitative results on the three datasets.Comment: Accepted for publication in IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI
Videoprompter: an ensemble of foundational models for zero-shot video understanding
Vision-language models (VLMs) classify the query video by calculating a
similarity score between the visual features and text-based class label
representations. Recently, large language models (LLMs) have been used to
enrich the text-based class labels by enhancing the descriptiveness of the
class names. However, these improvements are restricted to the text-based
classifier only, and the query visual features are not considered. In this
paper, we propose a framework which combines pre-trained discriminative VLMs
with pre-trained generative video-to-text and text-to-text models. We introduce
two key modifications to the standard zero-shot setting. First, we propose
language-guided visual feature enhancement and employ a video-to-text model to
convert the query video to its descriptive form. The resulting descriptions
contain vital visual cues of the query video, such as what objects are present
and their spatio-temporal interactions. These descriptive cues provide
additional semantic knowledge to VLMs to enhance their zeroshot performance.
Second, we propose video-specific prompts to LLMs to generate more meaningful
descriptions to enrich class label representations. Specifically, we introduce
prompt techniques to create a Tree Hierarchy of Categories for class names,
offering a higher-level action context for additional visual cues, We
demonstrate the effectiveness of our approach in video understanding across
three different zero-shot settings: 1) video action recognition, 2)
video-to-text and textto-video retrieval, and 3) time-sensitive video tasks.
Consistent improvements across multiple benchmarks and with various VLMs
demonstrate the effectiveness of our proposed framework. Our code will be made
publicly available
An Adaptive Human Activity-Aided Hand-Held Smartphone-Based Pedestrian Dead Reckoning Positioning System
Pedestrian dead reckoning (PDR), enabled by smartphones’ embedded inertial sensors, is widely applied as a type of indoor positioning system (IPS). However, traditional PDR faces two challenges to improve its accuracy: lack of robustness for different PDR-related human activities and positioning error accumulation over elapsed time. To cope with these issues, we propose a novel adaptive human activity-aided PDR (HAA-PDR) IPS that consists of two main parts, human activity recognition (HAR) and PDR optimization. (1) For HAR, eight different locomotion-related activities are divided into two classes: steady-heading activities (ascending/descending stairs, stationary, normal walking, stationary stepping, and lateral walking) and non-steady-heading activities (door opening and turning). A hierarchical combination of a support vector machine (SVM) and decision tree (DT) is used to recognize steady-heading activities. An autoencoder-based deep neural network (DNN) and a heading range-based method to recognize door opening and turning, respectively. The overall HAR accuracy is over 98.44%. (2) For optimization methods, a process automatically sets the parameters of the PDR differently for different activities to enhance step counting and step length estimation. Furthermore, a method of trajectory optimization mitigates PDR error accumulation utilizing the non-steady-heading activities. We divided the trajectory into small segments and reconstructed it after targeted optimization of each segment. Our method does not use any a priori knowledge of the building layout, plan, or map. Finally, the mean positioning error of our HAA-PDR in a multilevel building is 1.79 m, which is a significant improvement in accuracy compared with a baseline state-of-the-art PDR system
Around-Body Interaction: Leveraging Limb Movements for Interacting in a Digitally Augmented Physical World
Recent technological advances have made head-mounted displays (HMDs) smaller
and untethered, fostering the vision of ubiquitous interaction with information
in a digitally augmented physical world. For interacting with such devices,
three main types of input - besides not very intuitive finger gestures - have
emerged so far: 1) Touch input on the frame of the devices or 2) on accessories
(controller) as well as 3) voice input. While these techniques have both
advantages and disadvantages depending on the current situation of the user,
they largely ignore the skills and dexterity that we show when interacting with
the real world: Throughout our lives, we have trained extensively to use our
limbs to interact with and manipulate the physical world around us.
This thesis explores how the skills and dexterity of our upper and lower
limbs, acquired and trained in interacting with the real world, can be
transferred to the interaction with HMDs. Thus, this thesis develops the vision
of around-body interaction, in which we use the space around our body, defined
by the reach of our limbs, for fast, accurate, and enjoyable interaction with
such devices. This work contributes four interaction techniques, two for the
upper limbs and two for the lower limbs: The first contribution shows how the
proximity between our head and hand can be used to interact with HMDs. The
second contribution extends the interaction with the upper limbs to multiple
users and illustrates how the registration of augmented information in the real
world can support cooperative use cases. The third contribution shifts the
focus to the lower limbs and discusses how foot taps can be leveraged as an
input modality for HMDs. The fourth contribution presents how lateral shifts of
the walking path can be exploited for mobile and hands-free interaction with
HMDs while walking.Comment: thesi
- …