Search CORE

118 research outputs found

Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition

Author: Andrea Bottino
Barbara Caputo
Mirco Planamente
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/12/2020
Field of study

Wearable cameras are becoming more and more popular in several applications, increasing the interest of the research community in developing approaches for recognizing actions from the first-person point of view. An open challenge in egocentric action recognition is that videos lack detailed information about the main actor's pose and thus tend to record only parts of the movement when focusing on manipulation tasks. Thus, the amount of information about the action itself is limited, making crucial the understanding of the manipulated objects and their context. Many previous works addressed this issue with two-stream architectures, where one stream is dedicated to modeling the appearance of objects involved in the action, and another to extracting motion features from optical flow. In this paper, we argue that learning features jointly from these two information channels is beneficial to capture the spatio-temporal correlations between the two better. To this end, we propose a single stream architecture able to do so, thanks to the addition of a self-supervised block that uses a pretext motion prediction task to intertwine motion and appearance knowledge. Experiments on several publicly available databases show the power of our approach

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Leveraging over depth in egocentric activity recognition

Author: Caputo Barbara
Planamente Mirco
Russo Paolo
Publication venue: I-RIM
Publication date: 01/01/2019
Field of study

Activity recognition from first person videos is a growing research area. The increasing diffusion of egocentric sensors in various devices makes it timely to develop approaches able to recognize fine grained first person actions like picking up, putting down, pouring and so forth. While most of previous work focused on RGB data, some authors pointed out the importance of leveraging over depth information in this domain. In this paper we follow this trend and we propose the first deep architecture that uses depth maps as an attention mechanism for first person activity recognition. Specifically, we blend together the RGB and depth data, so to obtain an enriched input for the network. This blending puts more or less emphasis on different parts of the image based on their distance from the observer, hence acting as an attention mechanism. To further strengthen the proposed activity recognition protocol, we opt for a self labeling approach. This, combined with a Conv-LSTM block for extracting temporal information from the various frames, leads to the new state of the art on two publicly available benchmark databases. An ablation study completes our experimental findings, confirming the effectiveness of our approac

ZENODO

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

PoliTO-IIT Submission to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition

Author: Alberti Emanuele
Caputo Barbara
Planamente Mirco
Plizzari Chiara
Publication venue
Publication date: 01/01/2021
Field of study

In this report, we describe the technical details of our submission to the EPIC-Kitchens-100 Unsupervised Domain Adaptation (UDA) Challenge in Action Recognition. To tackle the domain-shift which exists under the UDA setting, we first exploited a recent Domain Generalization (DG) technique, called Relative Norm Alignment (RNA). It consists in designing a model able to generalize well to any unseen domain, regardless of the possibility to access target data at training time. Then, in a second phase, we extended the approach to work on unlabelled target data, allowing the model to adapt to the target distribution in an unsupervised fashion. For this purpose, we included in our framework existing UDA algorithms, such as Temporal Attentive Adversarial Adaptation Network (TA3N), jointly with new multi-stream consistency losses, namely Temporal Hard Norm Alignment (T-HNA) and Min-Entropy Consistency (MEC). Our submission (entry 'plnet') is visible on the leaderboard and it achieved the 1st position for 'verb', and the 3rd position for both 'noun' and 'action'.Comment: 3rd place in the 2021 EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognitio

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Domain generalization through audio-visual relative norm alignment in first person action recognition

Author: Alberti Emanuele
Caputo Barbara
Planamente Mirco
Plizzari Chiara
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

First person action recognition is becoming an increasingly researched area thanks to the rising popularity of wearable cameras. This is bringing to light cross-domain issues that are yet to be addressed in this context. Indeed, the information extracted from learned representations suffers from an intrinsic "environmental bias". This strongly affects the ability to generalize to unseen scenarios, limiting the application of current methods to real settings where labeled data are not available during training. In this work, we introduce the first domain generalization approach for egocentric activity recognition, by proposing a new audiovisual loss, called Relative Norm Alignment loss. It rebalances the contributions from the two modalities during training, over different domains, by aligning their feature norm representations. Our approach leads to strong results in domain generalization on both EPIC-Kitchens-55 and EPIC-Kitchens-100, as demonstrated by extensive experiments, and can be extended to work also on domain adaptation settings with competitive results

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Bringing Online Egocentric Action Recognition into the wild

Author: Averta Giuseppe
Caputo Barbara
Goletto Gabriele
Planamente Mirco
Publication venue
Publication date: 05/11/2022
Field of study

To enable a safe and effective human-robot cooperation, it is crucial to develop models for the identification of human activities. Egocentric vision seems to be a viable solution to solve this problem, and therefore many works provide deep learning solutions to infer human actions from first person videos. However, although very promising, most of these do not consider the major challenges that comes with a realistic deployment, such as the portability of the model, the need for real-time inference, and the robustness with respect to the novel domains (i.e., new spaces, users, tasks). With this paper, we set the boundaries that egocentric vision models should consider for realistic applications, defining a novel setting of egocentric action recognition in the wild, which encourages researchers to develop novel, applications-aware solutions. We also present a new model-agnostic technique that enables the rapid repurposing of existing architectures in this new context, demonstrating the feasibility to deploy a model on a tiny device (Jetson Nano) and to perform the task directly on the edge with very low energy consumption (2.4W on average at 50 fps)

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Unsupervised Domain Adaptation through Inter-Modal Rotation for RGB-D Object Recognition

Author: Caputo B.
Loghmani M. R.
Park K.
Planamente M.
Robbiano L.
Vincze M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Unsupervised Domain Adaptation (DA) exploits the supervision of a label-rich source dataset to make predictions on an unlabeled target dataset by aligning the two data distributions. In robotics, DA is used to take advantage of automatically generated synthetic data, that come with 'free' annotation, to make effective predictions on real data. However, existing DA methods are not designed to cope with the multi-modal nature of RGB-D data, which are widely used in robotic vision. We propose a novel RGB-D DA method that reduces the synthetic-to-real domain shift by exploiting the inter-modal relation between the RGB and depth image. Our method consists of training a convolutional neural network to solve, in addition to the main recognition task, the pretext task of predicting the relative rotation between the RGB and depth image. To evaluate our method and encourage further research in this area, we define two benchmark datasets for object categorization and instance recognition. With extensive experiments, we show the benefits of leveraging the inter-modal relations for RGB-D DA. The code is available at: 'https://github.com/MRLoghmani/relative-rotation'

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Interaction and Signalling Networks:a report from the fourth 'Young Microbiologists Symposium on Microbe Signalling, Organisation and Pathogenesis'

Author: Clare L. Kirkpatrick
Delphine L. Caly
Jacob G. Malone
Olivier Lesouhaitier
Planamente
Shi-Qi An
Publication venue: 'Microbiology Society'
Publication date: 01/01/2017
Field of study

At the end of June, over 120 microbiologists from 18 countries gathered in Dundee, Scotland for the fourth edition of the Young Microbiologists Symposium on ‘Microbe Signalling, Organisation and Pathogenesis’. The aim of the symposium was to give early career microbiologists the opportunity to present their work in a convivial environment and to interact with senior world-renowned scientists in exciting fields of microbiology research. The meeting was supported by the Microbiology Society, the Society of Applied Microbiology and the American Society for Microbiology with further sponsorship from the European Molecular Biology Organisation and the Royal Society of Edinburgh. In this report, we highlight some themes that emerged from the many interesting talks and poster presentations, as well as some of the other activities that were on offer at this energetic meeting

HAL - Normandie Université

Southampton (e-Prints Soton)

Crossref

University of Dundee Online Publications

University of East Anglia digital repository