Search CORE

653 research outputs found

EGO-TOPO: Environment Affordances from Egocentric Video

Author: Feichtenhofer Christoph
Grauman Kristen
Li Yanghao
Nagarajan Tushar
Publication venue
Publication date: 27/03/2020
Field of study

First-person video naturally brings the use of a physical environment to the forefront, since it shows the camera wearer interacting fluidly in a space based on his intentions. However, current methods largely separate the observed actions from the persistent space itself. We introduce a model for environment affordances that is learned directly from egocentric video. The main idea is to gain a human-centric model of a physical space (such as a kitchen) that captures (1) the primary spatial zones of interaction and (2) the likely activities they support. Our approach decomposes a space into a topological map derived from first-person activity, organizing an ego-video into a series of visits to the different zones. Further, we show how to link zones across multiple related environments (e.g., from videos of multiple kitchens) to obtain a consolidated representation of environment functionality. On EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene affordances and anticipating future actions in long-form video.Comment: Published in CVPR 2020, project page: http://vision.cs.utexas.edu/projects/ego-topo

arXiv.org e-Print Archive

Crossref

Ego4D:Around the World in 3,000 Hours of Egocentric Video

Author: Arbelaez Pablo
Crandall David
Damen Dima
Farinella Giovanni Maria
Fragomeni Adriano
Ghanem Bernard
Grauman Kristen
Jawahar C.V.
Kitani Kris
Malik Jitendra
Munro Jonathan P N
Oliva Aude
Park Hyun Soo
Price Will
Rehg James M.
Sato Yoichi
Shou Mike Zheng
Torrallba Antonio
Wray Michael
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 19/06/2022
Field of study

Explore Bristol Research

Multiple Trajectory Prediction of Moving Agents with Memory Augmented Networks

Author: Becattini Federico
Del Bimbo Alberto
Marchetti Francesco
Seidenari Lorenzo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Pedestrians and drivers are expected to safely navigate complex urban environments along with several non cooperating agents. Autonomous vehicles will soon replicate this capability. Each agent acquires a representation of the world from an egocentric perspective and must make decisions ensuring safety for itself and others. This requires to predict motion patterns of observed agents for a far enough future. In this paper we propose MANTRA, a model that exploits memory augmented networks to effectively predict multiple trajectories of other agents, observed from an egocentric perspective. Our model stores observations in memory and uses trained controllers to write meaningful pattern encodings and read trajectories that are most likely to occur in future. We show that our method is able to natively perform multi-modal trajectory prediction obtaining state-of-the art results on four datasets. Moreover, thanks to the non-parametric nature of the memory module, we show how once trained our system can continuously improve by ingesting novel patterns

Archivio della Ricerca - Università degli Studi di Siena

Florence Research

Integration of Experts' and Beginners' Machine Operation Experiences to Obtain a Detailed Task Model

Author: CHEN Longfei
DAMEN Dima
KONDO Kazuaki
MAYOL-CUEVAS Walterio
NAKAMURA Yuichi
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2021
Field of study

We propose a novel framework for integrating beginners' machine operational experiences with those of experts' to obtain a detailed task model. Beginners can provide valuable information for operation guidance and task design; for example, from the operations that are easy or difficult for them, the mistakes they make, and the strategy they tend to choose. However, beginners' experiences often vary widely and are difficult to integrate directly. Thus, we consider an operational experience as a sequence of hand-machine interactions at hotspots. Then, a few experts' experiences and a sufficient number of beginners' experiences are unified using two aggregation steps that align and integrate sequences of interactions. We applied our method to more than 40 experiences of a sewing task. The results demonstrate good potential for modeling and obtaining important properties of the task

Kyoto University Research Information Repository

EgoEnv: Human-centric environment representations from egocentric video

Author: Desai Ruta
Grauman Kristen
Hillis James
Nagarajan Tushar
Ramakrishnan Santhosh Kumar
Publication venue
Publication date: 09/11/2023
Field of study

First-person video highlights a camera-wearer's activities in the context of their persistent environment. However, current video understanding approaches reason over visual features from short video clips that are detached from the underlying physical space and capture only what is immediately visible. To facilitate human-centric environment understanding, we present an approach that links egocentric video and the environment by learning representations that are predictive of the camera-wearer's (potentially unseen) local surroundings. We train such models using videos from agents in simulated 3D environments where the environment is fully observable, and test them on human-captured real-world videos from unseen environments. On two human-centric video tasks, we show that models equipped with our environment-aware features consistently outperform their counterparts with traditional clip features. Moreover, despite being trained exclusively on simulated videos, our approach successfully handles real-world videos from HouseTours and Ego4D, and achieves state-of-the-art results on the Ego4D NLQ challenge. Project page: https://vision.cs.utexas.edu/projects/ego-env/Comment: Published in NeurIPS 2023 (Oral

arXiv.org e-Print Archive

A Survey on Human-aware Robot Navigation

Author: Battiato Sebastiano
Farinella Giovanni Maria
Furnari Antonino
Härmä Aki
Möller Ronja
Publication venue
Publication date: 22/06/2021
Field of study

Intelligent systems are increasingly part of our everyday lives and have been integrated seamlessly to the point where it is difficult to imagine a world without them. Physical manifestations of those systems on the other hand, in the form of embodied agents or robots, have so far been used only for specific applications and are often limited to functional roles (e.g. in the industry, entertainment and military fields). Given the current growth and innovation in the research communities concerned with the topics of robot navigation, human-robot-interaction and human activity recognition, it seems like this might soon change. Robots are increasingly easy to obtain and use and the acceptance of them in general is growing. However, the design of a socially compliant robot that can function as a companion needs to take various areas of research into account. This paper is concerned with the navigation aspect of a socially-compliant robot and provides a survey of existing solutions for the relevant areas of research as well as an outlook on possible future directions.Comment: Robotics and Autonomous Systems, 202

arXiv.org e-Print Archive

Maastricht University Research Portal

Hierarchical Hidden Markov Model in Detecting Activities of Daily Living in Wearable Videos for Studies of Dementia

Author: A Doherty
B Scholkopf
C Burges
E Kijak
H Amieva
H Bay
J Pinquier
Jean-François Dartigues
Jenny Benois-Pineau
JS Boreczky
Julien Pinquier
L Ballan
LR Rabiner
M Delakis
M Ostendorf
R André-Obrecht
R Hamid
R Poppe
Régine André-Obrecht
Rémi Mégret
S Fine
SP Chatzis
Svebor Karaman
Vladislavs Dovgalecs
Y Ivanov
Yann Gaëstel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/01/2014
Field of study

International audienceThis paper presents a method for indexing activities of daily living in videos obtained from wearable cameras. In the context of dementia diagnosis by doctors, the videos are recorded at patients' houses and later visualized by the medical practitioners. The videos may last up to two hours, therefore a tool for an efficient navigation in terms of activities of interest is crucial for the doctors. The specific recording mode provides video data which are really difficult, being a single sequence shot where strong motion and sharp lighting changes often appear. Our work introduces an automatic motion based segmentation of the video and a video structuring approach in terms of activities by a hierarchical two-level Hidden Markov Model. We define our description space over motion and visual characteristics of video and audio channels. Experiments on real data obtained from the recording at home of several patients show the difficulty of the task and the promising results of our approach

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain

Author: Farinella Giovanni Maria
Furnari Antonino
Ragusa Francesco
Publication venue
Publication date: 18/09/2022
Field of study

Wearable cameras allow to acquire images and videos from the user's perspective. These data can be processed to understand humans behavior. Despite human behavior analysis has been thoroughly investigated in third person vision, it is still understudied in egocentric settings and in particular in industrial scenarios. To encourage research in this field, we present MECCANO, a multimodal dataset of egocentric videos to study humans behavior understanding in industrial-like settings. The multimodality is characterized by the presence of gaze signals, depth maps and RGB videos acquired simultaneously with a custom headset. The dataset has been explicitly labeled for fundamental tasks in the context of human behavior understanding from a first person view, such as recognizing and anticipating human-object interactions. With the MECCANO dataset, we explored five different tasks including 1) Action Recognition, 2) Active Objects Detection and Recognition, 3) Egocentric Human-Objects Interaction Detection, 4) Action Anticipation and 5) Next-Active Objects Detection. We propose a benchmark aimed to study human behavior in the considered industrial-like scenario which demonstrates that the investigated tasks and the considered scenario are challenging for state-of-the-art algorithms. To support research in this field, we publicy release the dataset at https://iplab.dmi.unict.it/MECCANO/.Comment: arXiv admin note: text overlap with arXiv:2010.0565

arXiv.org e-Print Archive

Cooktop Sensing Based on a YOLO Object Detection Algorithm

Author: Azkarate Jon
Azurmendi Iker
González Manuel
López Guede José Manuel
Zulueta Guerrero Ekaitz
Publication venue: 'MDPI AG'
Publication date: 10/03/2023
Field of study

Deep Learning (DL) has provided a significant breakthrough in many areas of research and industry. The development of Convolutional Neural Networks (CNNs) has enabled the improvement of computer vision-based techniques, making the information gathered from cameras more useful. For this reason, recently, studies have been carried out on the use of image-based DL in some areas of people’s daily life. In this paper, an object detection-based algorithm is proposed to modify and improve the user experience in relation to the use of cooking appliances. The algorithm can sense common kitchen objects and identify interesting situations for users. Some of these situations are the detection of utensils on lit hobs, recognition of boiling, smoking and oil in kitchenware, and determination of good cookware size adjustment, among others. In addition, the authors have achieved sensor fusion by using a cooker hob with Bluetooth connectivity, so it is possible to automatically interact with it via an external device such as a computer or a mobile phone. Our main contribution focuses on supporting people when they are cooking, controlling heaters, or alerting them with different types of alarms. To the best of our knowledge, this is the first time a YOLO algorithm has been used to control the cooktop by means of visual sensorization. Moreover, this research paper provides a comparison of the detection performance among different YOLO networks. Additionally, a dataset of more than 7500 images has been generated and multiple data augmentation techniques have been compared. The results show that YOLOv5s can successfully detect common kitchen objects with high accuracy and fast speed, and it can be employed for realistic cooking environment applications. Finally, multiple examples of the identification of interesting situations and how we act on the cooktop are presented.The current study has been sponsored by the Government of the Basque Country-ELKARTEK21/10 KK-2021/00014 (“Estudio de nuevas técnicas de inteligencia artificial basadas en Deep Learning dirigidas a la optimización de procesos industriales”) and ELKARTEK23-DEEPBASK (“Creación de nuevos algoritmos de aprendizaje profundo aplicado a la industria”) research programmes

Archivo Digital para la Docencia y la Investigación