1,371 research outputs found

    Egocentric Activity Recognition with Multimodal Fisher Vector

    Full text link
    With the increasing availability of wearable devices, research on egocentric activity recognition has received much attention recently. In this paper, we build a Multimodal Egocentric Activity dataset which includes egocentric videos and sensor data of 20 fine-grained and diverse activity categories. We present a novel strategy to extract temporal trajectory-like features from sensor data. We propose to apply the Fisher Kernel framework to fuse video and temporal enhanced sensor features. Experiment results show that with careful design of feature extraction and fusion algorithm, sensor data can enhance information-rich video data. We make publicly available the Multimodal Egocentric Activity dataset to facilitate future research.Comment: 5 pages, 4 figures, ICASSP 2016 accepte

    Unsupervised Mapping and Semantic User Localisation from First-Person Monocular Video

    Get PDF
    We propose an unsupervised probabilistic framework for learning a human-centred representation of a person’s environment from first-person video. Specifically, non-geometric maps modelled as hierarchies of probabilistic place graphs and view graphs are learned. Place graphs model a user’s patterns of transition between physical locations whereas view graphs capture an aspect of user behaviour within those locations. Furthermore, we describe an implementation in which the notion of place is divided into stations and the routes that interconnect them. Stations typically correspond to rooms or areas where a user spends time. Visits to stations are temporally segmented based on qualitative visual motion. We describe how to learn maps online in an unsupervised manner, and how to localise the user within these maps. We report experiments on two datasets, including comparison of performance with and without view graphs, and demonstrate better online mapping than when using offline clustering.<br/

    MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain

    Full text link
    Wearable cameras allow to acquire images and videos from the user's perspective. These data can be processed to understand humans behavior. Despite human behavior analysis has been thoroughly investigated in third person vision, it is still understudied in egocentric settings and in particular in industrial scenarios. To encourage research in this field, we present MECCANO, a multimodal dataset of egocentric videos to study humans behavior understanding in industrial-like settings. The multimodality is characterized by the presence of gaze signals, depth maps and RGB videos acquired simultaneously with a custom headset. The dataset has been explicitly labeled for fundamental tasks in the context of human behavior understanding from a first person view, such as recognizing and anticipating human-object interactions. With the MECCANO dataset, we explored five different tasks including 1) Action Recognition, 2) Active Objects Detection and Recognition, 3) Egocentric Human-Objects Interaction Detection, 4) Action Anticipation and 5) Next-Active Objects Detection. We propose a benchmark aimed to study human behavior in the considered industrial-like scenario which demonstrates that the investigated tasks and the considered scenario are challenging for state-of-the-art algorithms. To support research in this field, we publicy release the dataset at https://iplab.dmi.unict.it/MECCANO/.Comment: arXiv admin note: text overlap with arXiv:2010.0565

    EGO-TOPO: Environment Affordances from Egocentric Video

    Full text link
    First-person video naturally brings the use of a physical environment to the forefront, since it shows the camera wearer interacting fluidly in a space based on his intentions. However, current methods largely separate the observed actions from the persistent space itself. We introduce a model for environment affordances that is learned directly from egocentric video. The main idea is to gain a human-centric model of a physical space (such as a kitchen) that captures (1) the primary spatial zones of interaction and (2) the likely activities they support. Our approach decomposes a space into a topological map derived from first-person activity, organizing an ego-video into a series of visits to the different zones. Further, we show how to link zones across multiple related environments (e.g., from videos of multiple kitchens) to obtain a consolidated representation of environment functionality. On EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene affordances and anticipating future actions in long-form video.Comment: Published in CVPR 2020, project page: http://vision.cs.utexas.edu/projects/ego-topo

    Egocentric Perception of Hands and Its Applications

    Get PDF

    Progress in ambient assisted systems for independent living by the elderly

    Get PDF
    One of the challenges of the ageing population in many countries is the efficient delivery of health and care services, which is further complicated by the increase in neurological conditions among the elderly due to rising life expectancy. Personal care of the elderly is of concern to their relatives, in case they are alone in their homes and unforeseen circumstances occur, affecting their wellbeing. The alternative; i.e. care in nursing homes or hospitals is costly and increases further if specialized care is mobilized to patients’ place of residence. Enabling technologies for independent living by the elderly such as the ambient assisted living systems (AALS) are seen as essential to enhancing care in a cost-effective manner. In light of significant advances in telecommunication, computing and sensor miniaturization, as well as the ubiquity of mobile and connected devices embodying the concept of the Internet of Things (IoT), end-to-end solutions for ambient assisted living have become a reality. The premise of such applications is the continuous and most often real-time monitoring of the environment and occupant behavior using an event-driven intelligent system, thereby providing a facility for monitoring and assessment, and triggering assistance as and when needed. As a growing area of research, it is essential to investigate the approaches for developing AALS in literature to identify current practices and directions for future research. This paper is, therefore, aimed at a comprehensive and critical review of the frameworks and sensor systems used in various ambient assisted living systems, as well as their objectives and relationships with care and clinical systems. Findings from our work suggest that most frameworks focused on activity monitoring for assessing immediate risks while the opportunities for integrating environmental factors for analytics and decision-making, in particular for the long-term care were often overlooked. The potential for wearable devices and sensors, as well as distributed storage and access (e.g. cloud) are yet to be fully appreciated. There is a distinct lack of strong supporting clinical evidence from the implemented technologies. Socio-cultural aspects such as divergence among groups, acceptability and usability of AALS were also overlooked. Future systems need to look into the issues of privacy and cyber security
    • …
    corecore