1,371 research outputs found
Egocentric Activity Recognition with Multimodal Fisher Vector
With the increasing availability of wearable devices, research on egocentric
activity recognition has received much attention recently. In this paper, we
build a Multimodal Egocentric Activity dataset which includes egocentric videos
and sensor data of 20 fine-grained and diverse activity categories. We present
a novel strategy to extract temporal trajectory-like features from sensor data.
We propose to apply the Fisher Kernel framework to fuse video and temporal
enhanced sensor features. Experiment results show that with careful design of
feature extraction and fusion algorithm, sensor data can enhance
information-rich video data. We make publicly available the Multimodal
Egocentric Activity dataset to facilitate future research.Comment: 5 pages, 4 figures, ICASSP 2016 accepte
Unsupervised Mapping and Semantic User Localisation from First-Person Monocular Video
We propose an unsupervised probabilistic framework for learning a human-centred representation of a person’s environment from first-person video. Specifically, non-geometric maps modelled as hierarchies of probabilistic place graphs and view graphs are learned. Place graphs model a user’s patterns of transition between physical locations whereas view graphs capture an aspect of user behaviour within those locations. Furthermore, we describe an implementation in which the notion of place is divided into stations and the routes that interconnect them. Stations typically correspond to rooms or areas where a user spends time. Visits to stations are temporally segmented based on qualitative visual motion. We describe how to learn maps online in an unsupervised manner, and how to localise the user within these maps. We report experiments on two datasets, including comparison of performance with and without view graphs, and demonstrate better online mapping than when using offline clustering.<br/
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain
Wearable cameras allow to acquire images and videos from the user's
perspective. These data can be processed to understand humans behavior. Despite
human behavior analysis has been thoroughly investigated in third person
vision, it is still understudied in egocentric settings and in particular in
industrial scenarios. To encourage research in this field, we present MECCANO,
a multimodal dataset of egocentric videos to study humans behavior
understanding in industrial-like settings. The multimodality is characterized
by the presence of gaze signals, depth maps and RGB videos acquired
simultaneously with a custom headset. The dataset has been explicitly labeled
for fundamental tasks in the context of human behavior understanding from a
first person view, such as recognizing and anticipating human-object
interactions. With the MECCANO dataset, we explored five different tasks
including 1) Action Recognition, 2) Active Objects Detection and Recognition,
3) Egocentric Human-Objects Interaction Detection, 4) Action Anticipation and
5) Next-Active Objects Detection. We propose a benchmark aimed to study human
behavior in the considered industrial-like scenario which demonstrates that the
investigated tasks and the considered scenario are challenging for
state-of-the-art algorithms. To support research in this field, we publicy
release the dataset at https://iplab.dmi.unict.it/MECCANO/.Comment: arXiv admin note: text overlap with arXiv:2010.0565
EGO-TOPO: Environment Affordances from Egocentric Video
First-person video naturally brings the use of a physical environment to the
forefront, since it shows the camera wearer interacting fluidly in a space
based on his intentions. However, current methods largely separate the observed
actions from the persistent space itself. We introduce a model for environment
affordances that is learned directly from egocentric video. The main idea is to
gain a human-centric model of a physical space (such as a kitchen) that
captures (1) the primary spatial zones of interaction and (2) the likely
activities they support. Our approach decomposes a space into a topological map
derived from first-person activity, organizing an ego-video into a series of
visits to the different zones. Further, we show how to link zones across
multiple related environments (e.g., from videos of multiple kitchens) to
obtain a consolidated representation of environment functionality. On
EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene
affordances and anticipating future actions in long-form video.Comment: Published in CVPR 2020, project page:
http://vision.cs.utexas.edu/projects/ego-topo
Progress in ambient assisted systems for independent living by the elderly
One of the challenges of the ageing population in many countries is the efficient delivery of health and care services, which is further complicated by the increase in neurological conditions among the elderly due to rising life expectancy. Personal care of the elderly is of concern to their relatives, in case they are alone in their homes and unforeseen circumstances occur, affecting their wellbeing. The alternative; i.e. care in nursing homes or hospitals is costly and increases further if specialized care is mobilized to patients’ place of residence. Enabling technologies for independent living by the elderly such as the ambient assisted living systems (AALS) are seen as essential to enhancing care in a cost-effective manner. In light of significant advances in telecommunication, computing and sensor miniaturization, as well as the ubiquity of mobile and connected devices embodying the concept of the Internet of Things (IoT), end-to-end solutions for ambient assisted living have become a reality. The premise of such applications is the continuous and most often real-time monitoring of the environment and occupant behavior using an event-driven intelligent system, thereby providing a facility for monitoring and assessment, and triggering assistance as and when needed. As a growing area of research, it is essential to investigate the approaches for developing AALS in literature to identify current practices and directions for future research. This paper is, therefore, aimed at a comprehensive and critical review of the frameworks and sensor systems used in various ambient assisted living systems, as well as their objectives and relationships with care and clinical systems. Findings from our work suggest that most frameworks focused on activity monitoring for assessing immediate risks while the opportunities for integrating environmental factors for analytics and decision-making, in particular for the long-term care were often overlooked. The potential for wearable devices and sensors, as well as distributed storage and access (e.g. cloud) are yet to be fully appreciated. There is a distinct lack of strong supporting clinical evidence from the implemented technologies. Socio-cultural aspects such as divergence among groups, acceptability and usability of AALS were also overlooked. Future systems need to look into the issues of privacy and cyber security
- …