Search CORE

43 research outputs found

Learning from Synthetic Humans

Author: Black Michael J.
Laptev Ivan
Mahmood Naureen
Martin Xavier
Romero Javier
Schmid Cordelia
Varol Gül
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Estimating human pose, shape, and motion from images and videos are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data.Comment: Appears in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). 9 page

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

MPG.PuRe

Unsupervised Learning of Long-Term Motion Dynamics for Videos

Author: Alahi Alexandre
Fei-Fei Li
Huang De-An
Luo Zelun
Peng Boya
Publication venue
Publication date: 11/04/2017
Field of study

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent Neural Network based Encoder-Decoder framework to predict these sequences of flows. We argue that in order for the decoder to reconstruct these sequences, the encoder must learn a robust video representation that captures long-term motion dependencies and spatial-temporal relations. We demonstrate the effectiveness of our learned temporal representations on activity classification across multiple modalities and datasets such as NTU RGB+D and MSR Daily Activity 3D. Our framework is generic to any input modality, i.e., RGB, Depth, and RGB-D videos.Comment: CVPR 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Deep Learning in cucina: sviluppo e validazione di un sistema di riconoscimento di azioni basato su sensori RGBD

Author: C. Nuzzi
E. Picinardi
G. Sansoni
S. Pasinetti
Publication venue
Publication date: 01/01/2020
Field of study

La presente memoria descrive i primi risultati raggiunti nell’ambito di un progetto di ricerca con la scuola bresciana di cucina Cast Alimenti. L’obiettivo del lavoro di ricerca è lo sviluppo di un sistema intelligente per il riconoscimento delle azioni svolte da un cuoco durante la preparazione di una ricetta. Cast Alimenti mira ad ottenere un prodotto da utilizzare durante la didattica che abbia un duplice scopo: da una parte si vuole riconoscere che operazione il cuoco docente sta effettuando, con che ingredienti e con quali utensili; dall’altra parte il sistema sarà in grado di effettuare la stessa operazione di riconoscimento con gli alunni della lezione, controllando se l’operazione pratica di cucina viene svolta nel modo migliore. In questa memoria vengono descritti i primi risultati raggiunti relativi al riconoscimento delle azioni del cuoco. Il riconoscimento delle azioni è stato effettuato e valutato confrontando due tra i migliori algoritmi di riconoscimento azioni basati su reti neurali ricorsive: il primo, denominato Human Pose Model and Temporal Modelling (HPM+TM), basato sull’analisi di immagini di profondità e il secondo, denominato Indipendetly Recurrent Neural Network (IndRNN), basato sulla misura di diversi keypoint individuati a partire da una skeletonization del soggetto ripreso

Archivio istituzionale della ricerca - Università di Brescia